Python - Error when trying to extract bytes from file - python

I am currently trying to extract the raw binary bytes from a file e.g. 000001001000
f = open(r"file.z", "rb")
try:
byte = f.read();
print int(byte)
finally:
f.close()
The reason to why I used int(byte) was to have a peek at what the string looks like. (I couldn't print it because [Decode error - output not utf-8])
Traceback (most recent call last):
File "C:\Users\werdnakof\Downloads\test.py", line 9, in <module>
print int(byte);
ValueError: invalid literal for int() with base 10: '\x04\x80e\x06\xc0l\x06\xf0,\x02'
It returns \x04\x80e\x06\xc0l\x06\xf0,\x02
And I am not too sure where to go from here. I was told this is in 12 bit fixed with codes padded on the left.
Any advice or tips on how to solve this? All I want is the 12-bit number e.g.000001001000

Use encode and bin:
bin(int(b.encode("hex"),16))
In [27]: b='\x04\x80e\x06\xc0l\x06\xf0,\x02'
In [28]: int(b.encode("hex"),16)
Out[28]: 21257928890331299851266L
In [29]: bin(int(b.encode("hex"),16))
Out[29]: '0b10010000000011001010000011011000000011011000000011011110000001011000000001
with open("file.z","rb") as f:
for line in f:
print(int(line.encode("hex"), 16))

To print the contents of a binary string, you can convert it to hex-representation:
print byte.encode('hex')
For reading binary structures, you can use the struct-module.

Can you try this
f = open("file.z", "rb")
try:
byte = f.read();
print(bin(int(str(byte).encode("hex"),16)))
finally:
f.close()
From Padraic Cunningham's answer

Related

How to unpack double digit hex file with python msgPack?

I have a text file containing some data, among these data there's a JSON packed with msgPack.
I am able to unpack on https://toolslick.com/conversion/data/messagepack-to-json but I can't get to make it work in python.
Up to now I am trying to do the following :
def parseAndSplit(path):
with open(path) as f:
fContent = f.read()
for subf in fContent.split('Payload: '):
'''for ssubf in subf.split('DataChunkMsg'):
print(ssubf)'''
return subf.split('DataChunkMsg')[0]
fpath = "path/to/file"
t = parseAndSplit(fpath)
l = t.split("-")
s = ""
for i in l:
s=s+i
print(s)
a = msgpack.unpackb(bytes(s,"UTF-8"), raw=False)
print(a)
but the output is
import msgpack
Traceback (most recent call last):
File "C:/Users/Marco/PycharmProjects/codeTest/msgPack.py", line 19, in <module>
a = msgpack.unpackb(bytes(s,"UTF-8"), raw=False)
File "msgpack\_unpacker.pyx", line 202, in msgpack._cmsgpack.unpackb
msgpack.exceptions.ExtraData: unpack(b) received extra data.
9392AA6E722D736230322D3032AC4F444D44617...(string goes on)
I am quite sure that it's an encoding problem of some sort but I am having no luck, wether in the docs or by trying .
Thank you very much for the attention
I found the solution in the end:
msgpack.unpackb(bytes.fromhex(hexstring)
where hexstring is the string read from the file.

Importing large set of numbers from a text file into python program in matrix form

I'm trying to import some pointcloud coordinates into python, the values are in a text file in this format
0.0054216 0.11349 0.040749
-0.0017447 0.11425 0.041273
-0.010661 0.11338 0.040916
0.026422 0.11499 0.032623
and so on.
Ive tried doing it by 2 methods
def getValue(filename):
try:
file = open(filename,'r')
except: IOError:
print ('problem with file'), filename
value = []
for line in file:
value.append(float(line))
return value
I called the above code in idle but there is an error that says the string cannot be converted to float.
import numpy as np
import matplotlib.pyplot as plt
data = np.genformtxt('co.txt', delimiter=',')
In this method when I call for data in idle it says data is not defined.
Below is the error message
>>> data[0:4]
Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
data[0:4]
NameError: name 'data' is not defined
With the data you provided I would say you are trying to use:
np.loadtxt(<yourtxt.txt>, delimiter=" ")
In this case your delimiter should be blank space as can be seen in you data. This works perfectly for me.
Your problem is you are using the comma as delimiter.
float() converts one number string, not several (read its docs)
In [32]: line = '0.0054216 0.11349 0.040749 '
In [33]: float(line)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-33-0f93140abeab> in <module>()
----> 1 float(line)
ValueError: could not convert string to float: '0.0054216 0.11349 0.040749 '
Note that the error tells us which string is giving it problems.
It works if we split the line into substrings and convert those individually.
In [34]: [float(x) for x in line.split()]
Out[34]: [0.0054216, 0.11349, 0.040749]
Similarly, genfromtxt needs to split the lines into proper substrings. There aren't any , in your file, so clearly that's the wrong delimiter. The default delimiter is white space, which works just fine in this case.
In [35]: data = np.genfromtxt([line])
In [36]: data
Out[36]: array([0.0054216, 0.11349 , 0.040749 ])
With the wrong delimiter it tries to convert the whole line to a float. It can't (same reason as above), so it uses np.nan instead.
In [37]: data = np.genfromtxt([line], delimiter=',')
In [38]: data
Out[38]: array(nan)

Write zlib compressed utf8 data to a file

I have a file with data encoded in utf-8. I would like to read the data, remove whitespaces, separate words with a newline, compress the entire content and write them to a file. This is what I am trying to do :
with codecs.open('1020104_4.utf8', encoding='utf8', mode='r') as fr :
data = re.split(r'\s+',fr.read().encode('utf8'))
#with codecs.open('out2', encoding='utf8', mode='w') as fw2 :
data2 = ('\n'.join(data)).decode('utf8')
data3 = zlib.compress(data2)
#fw2.write(data3)
However I get an error :
Traceback (most recent call last):
File "tmp2.py", line 17, in <module>
data3 = zlib.compress(data2)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 47-48: ordinal not in range(128)
How can I write this data to a file?
I think your encoding-foo is just the wrong way round, in Python 3 this would be a lot clearer ☺.
First, when splitting you want to do this on decoded data, i.e. on Unicode strings, which you already get from read since you are using codecs.open, so the first line should be
data = re.split(r'\s+', fr.read())
Consequently, before passing data to zlib you want to convert it to bytes by encoding it:
data2 = ('\n'.join(data)).encode('utf8')
data3 = zlib.compress(data2)
In the last step you want to write it to a binary file handle:
with open("output", "wb") as fw:
fw.write(data3)
You can shorten this a bit by using the gzip module instead:
with codecs.open('1020104_4.utf8', encoding='utf8', mode='r') as fr:
data = re.split(r'\s+', fr.read())
with gzip.open('out2', mode='wb') as fw2 :
data2 = ('\n'.join(data)).encode('utf8')
fw2.write(data2)

read an ascii file into a numpy array

I have an ascii file and I want to read it into a numpy array. But it was failing and for the first number in the file, it returns 'NaN' when I use numpy.genfromtxt. Then I tried to use the following way of reading the file into an array:
lines = file('myfile.asc').readlines()
X = []
for line in lines:
s = str.split(line)
X.append([float(s[i]) for i in range(len(s))])
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
ValueError: could not convert string to float: 15.514
when I printed the first line of the file it looks like :
>>> s
['\xef\xbb\xbf15.514', '15.433', '15.224', '14.998', '14.792', '15.564', '15.386', '15.293', '15.305', '15.132', '15.073', '15.005', '14.929', '14.823', '14.766', '14.768', '14.789']
how could I read such a file into a numpy array without problem and any presumption about the number of rows and columns?
Based on #falsetru's answer, I want to provide a solution with Numpy's file reading capabilities:
import numpy as np
import codecs
with codecs.open('myfile.asc', encoding='utf-8-sig') as f:
X = np.loadtxt(f)
It loads the file into an open file instance using the correct encoding. Numpy uses this kind of handle (it can also use handles from open() and works seemless like in every other case.
The file is encoded with utf-8 with BOM. Use codecs.open with utf-8-sig encoding to handle it correctly (To exclude BOM \xef\xbb\xbf).
import codecs
X = []
with codecs.open('myfile.asc', encoding='utf-8-sig') as f:
for line in f:
s = line.split()
X.append([float(s[i]) for i in range(len(s))])
UPDATE You don't need to use index at all:
with codecs.open('myfile.asc', encoding='utf-8-sig') as f:
X = [[float(x) for x in line.split()] for line in f]
BTW, instead of using the unbound method str.split(line), use line.split() if you have no special reason to do it.

Error accessing binary data from a python list

I'm pretty new to python, using python 2.7. I have to read in a binary file, and then concatenate some of the bytes together. So I tried
f = open("filename", "rb")
j=0
infile = []
try:
byte = f.read(1)
while byte != "":
infile.append(byte)
byte = f.read(1)
finally:
f.close()
blerg = (bin(infile[8])<<8 | bin(infile[9]))
print type
where I realize that the recast as binary is probably unnecessary, but this is one of my later attempts.
The error I'm getting is TypeError: 'str' object cannot be interpreted as index.
This is news to me, since I'm not using a string anywhere. What the !##% am I doing wrong?
EDIT: Full traceback
file binaryExtractor.py, line 25, in
blerg = (bin(infile[8])<<8 | bin(infile[9]))
TypeError: 'str' object cannot be interpreted as index
You should be using struct whenever possible instead of writing your own code for this.
>>> struct.unpack('<H', '\x12\x34')
(13330,)
You want to use the ord function which returns an integer from a single character string, not bin which returns a string representation of a binary number.

Categories