How to turn a string into a binary object in python

How to turn a string into a binary object in python - python

I'm using this library to download and decode MMS PDUs:
https://github.com/pmarti/python-messaging
The sample code almost works, except that this method:
mms = MMSMessage.from_data(response)
Is throwing an exception:
TypeError: unsupported operand type(s) for &: 'str' and 'int'
Which seems to obviously be some sort of binary formatting problem.
In the sample code, the HTTP response is passed directly into the from_data method, however in my case it comes through with HTTP headers on it so I'm splitting the response by double CRLF and then passing in just the PDU data:
data = buf.getvalue()
split = data.split("\r\n\r\n");
mms = MMSMessage.from_data(split[1].strip())
This throws an error BUT if I first write the exact same data to a file then use the from_file method it works:
data = buf.getvalue()
split = data.split("\r\n\r\n");
f = open('dump','w+')
f.write(split[1])
f.close()
path = 'dump'
mms = MMSMessage.from_file(path)
I looked in the from_file method, and all it does is load the contents and then pass it into the same method as the from_data method, so the first way should Just Work™.
What I did notice is that the file is opened in binary format, and the content is loaded like this:
data = array.array('B')
with open(filename, 'rb') as f:
data.fromfile(f, num_bytes)
return self.decode_data(data)
So it seems obvious that somehow what I'm passing into the first function is actually a "string representation of binary data" and what's being read from the file is "actual binary data".
I tried using bytearray like this to "binaryfy" the string:
mms = MMSMessage.from_data(bytearray(split[1].strip(), "utf8"))
but that throws the error:
Traceback (most recent call last):
File "decodepdu.py", line 41, in <module>
mms = MMSMessage.from_data(bytearray(split[1].strip(), "utf8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8c in position 0: ordinal not in range(128)
which seems weird because it's using an 'ascii' codec but I specified utf8 encoding.
Anyway at this point I'm in over my head because I'm not really all that familiar with python, so for now I'm just writing the content to a temporary file but I would really rather not.
Any help would be most appreciated!

Okay thanks to Paul M. in the comments, this works:
data = buf.getvalue()
split = data.split("\r\n\r\n");
pdu = array.array('B');
pdu.fromstring(split[1]);
mms = MMSMessage.from_data(pdu);

Related

unable to decode this string using python

I have this text.ucs file which I am trying to decode using python.
file = open('text.ucs', 'r')
content = file.read()
print content
My result is
\xf\xe\x002\22
I tried doing decoding with utf-16, utf-8
content.decode('utf-16')
and getting error
Traceback (most recent call last): File "", line 1, in
File "C:\Python27\lib\encodings\utf_16.py", line 16, in
decode
return codecs.utf_16_decode(input, errors, True) UnicodeDecodeError: 'utf16' codec can't decode bytes in position
32-33: illegal encoding
Please let me know if I am missing anything or my approach is wrong
Edit: Screenshot has been asked

The string is encoded as UTF16-BE (Big Endian), this works:
content.decode("utf-16-be")

oooh, as i understand you using python 2.x.x but encoding parameter was added only in python 3.x.x as I know, i am doesn't master of python 2.x.x but you can search in google about io.open for example try:
file = io.open('text.usc', 'r',encoding='utf-8')
content = file.read()
print content
but chek do you need import io module or not

You can specify which encoding to use with the encoding argument:
with open('text.ucs', 'r', encoding='utf-16') as f:
text = f.read()

your string need to Be Uncoded With The Coding utf-8 you can do What I Did Now for decode your string
f = open('text.usc', 'r',encoding='utf-8')
print f

Python3 Django ZipFile HttpResponse UnicodeDecodeError

I created a zip file like so:
zipped_file = zipfile.ZipFile("csvOutput{}.zip".format(timestamp), 'w')
zipped_file.write(sms_file_path)
zipped_file.write(mail_maga_file_path)
zipped_file.close()
And want to send it, which I am currently using this code:
response_file = open('csvOutput{}.zip'.format(timestamp))
response = HttpResponse(response_file, content_type="application/force-download")
response['Content-Disposition'] = 'attachment; filename=csvOutput{}.zip"'.format(timestamp)
return response
raise Http404
But maybe something I am using is out of date? Python keeps crashing with a byte it can't decode in unicode:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 14: invalid start byte
Says line 100 is throwing the error, which would be the line with HttpResponse()
Edit: I've changed the content_type to application/zip and am getting a new error (that seems better?):
caution: zipfile comment truncated
error [output.zip]: missing 3232109644 bytes in zipfile
(attempting to process anyway)
error [output.zip]: attempt to seek before beginning of zipfile
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)

Looks like this issue is with your time stamp string.
response_file = open('csvOutput{}.zip'.format(str(timestamp).encode('utf-8').strip()))
or
response_file = open('csvOutput{}.zip'.format(str(timestamp).encode('ISO-8859-1').strip()))

I've managed to fix the issue, by opening the file with 'rb' as in:
response_file = open('csvOutput{}.zip'.format(timestamp), 'rb')

Scipy: UnicodeDecodeError while loading image file

def image_to_laplacian(filename):
with open(filename, 'r', encoding="latin-1") as f:
s = f.read()
img = sc.misc.imread(f)
image_to_laplacian('images/bw_3x3.png')
Produces:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
"images/bw_3x3.png" is a 3x3 image I produced in Pinta. I tried opening a cat.jpg I got from Google Images, but I got the same error.
I also tried to use "encoding="latin-1" as an argument to open, based on something I read on SO; I was able to open the file, but I'm
read failed with the exception
OSError: cannot identify image file <_io.TextIOWrapper name='images/bw_3x3.png' mode='r' encoding='latin-1'>

The line that is causing the error is
s = f.read()
In 'r' mode this tries to read the data as a string, but it's a image file, so it will fail. You can use 'rb' instead. Definitely remove the encoding=latin because that's only relevant for text files.
Also, note that according to the documentation:
name : str or file object
The file name or file object to be read.
So you can dispense with opening a file and just give it a filepath as a string. The following should work:
img = sc.misc.imread(filename)

Python 3: JSON File Load with Non-ASCII Characters

just trying to load this JSON file(with non-ascii characters) as a python dictionary with Unicode encoding but still getting this error:
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 90: ordinal not in range(128)
JSON file content = "tooltip":{
"dxPivotGrid-sortRowBySummary": "Sort\"{0}\"byThisRow",}
import sys
import json
data = []
with open('/Users/myvb/Desktop/Automation/pt-PT.json') as f:
for line in f:
data.append(json.loads(line.encode('utf-8','replace')))

You have several problems as near as I can tell. First, is the file encoding. When you open a file without specifying an encoding, the file is opened with whatever sys.getfilesystemencoding() is. Since that may vary (especially on Windows machines) its a good idea to explicitly use encoding="utf-8" for most json files. Because of your error message, I suspect that the file was opened with an ascii encoding.
Next, the file is decoded from utf-8 into python strings as it is read by the file system object. The utf-8 line has already been decoded to a string and is already ready for json to read. When you do line.encode('utf-8','replace'), you encode the line back into a bytes object which the json loads (that is, "load string") can't handle.
Finally, "tooltip":{ "navbar":"Operações de grupo"} isn't valid json, but it does look like one line of a pretty-printed json file containing a single json object. My guess is that you should read the entire file as 1 json object.
Putting it all together you get:
import json
with open('/Users/myvb/Desktop/Automation/pt-PT.json', encoding="utf-8") as f:
data = json.load(f)
From its name, its possible that this file is encoded as a Windows Portugese code page. If so, the "cp860" encoding may work better.

I had the same problem, what worked for me was creating a regular expression, and parsing every line from the json file:
REGEXP = '[^A-Za-z0-9\'\:\.\;\-\?\!]+'
new_file_line = re.sub(REGEXP, ' ', old_file_line).strip()

Having a file with content similar to yours I can read the file in one simple shot:
>>> import json
>>> fname = "data.json"
>>> with open(fname) as f:
... data = json.load(f)
...
>>> data
{'tooltip': {'navbar': 'Operações de grupo'}}

You don't need to read each line. You have two options:
import sys
import json
data = []
with open('/Users/myvb/Desktop/Automation/pt-PT.json') as f:
data.append(json.load(f))
Or, you can load all lines and pass them to the json module:
import sys
import json
data = []
with open('/Users/myvb/Desktop/Automation/pt-PT.json') as f:
data.append(json.loads(''.join(f.readlines())))
Obviously, the first suggestion is the best.

Python 3.4 - reading data from a webpage

I'm currently trying to learn how to read from a webpage, and have tried the following:
>>>import urllib.request
>>>page = urllib.request.urlopen("http://docs.python-requests.org/en/latest/", data = None)
>>>contents = page.read()
>>>lines = contents.split('\n')
This gives the following error:
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
lines = contents.split('\n')
TypeError: Type str doesn't support the buffer API
Now I assumed that reading from a URL would be pretty similar from reading for a text file, and that the contents of contents would be of type str. Is this not that case?
When I try >>> contents I can see that the contents of contents is just the HTML document, so why doesn't `.split('\n') work? How can I make it work?
Please note that I'm splitting at the newline characters so I can print the webpage line by line.
Following the same train of thought, I then tried contents.readlines() which gave this error:
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
contents.readlines()
AttributeError: 'bytes' object has no attribute 'readlines'
Is the webpage stored in some object called 'bytes'?
Can someone explain to me what is happening here? And how to read the webpage properly?

You need to wrap it with an io.TextIOWrapper() object and encode your file (utf-8 is a universal you can change it to proper encoding too):
import urllib.request
import io
u = urllib.request.urlopen("http://docs.python-requests.org/en/latest/", data = None)
f = io.TextIOWrapper(u,encoding='utf-8')
text = f.read()

Decode the bytes object to produce a string:
lines = contents.decode(encoding="UTF-8").split("/n")

The return type of the read() method is of type bytes. You need to properly decode it to a string before you can use a string method like split. Assuming it is UTF-8 you can use:
s = contents.decode('utf-8')
lines = s.split('\n')
As a general solution you should check the character encoding the server provides in the response to your request and use that.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to turn a string into a binary object in python - python

Okay thanks to Paul M. in the comments, this works: data = buf.getvalue() split = data.split("\r\n\r\n"); pdu = array.array('B'); pdu.fromstring(split[1]); mms = MMSMessage.from_data(pdu);

Related

unable to decode this string using python

Python3 Django ZipFile HttpResponse UnicodeDecodeError

Scipy: UnicodeDecodeError while loading image file

Python 3: JSON File Load with Non-ASCII Characters

Python 3.4 - reading data from a webpage

Categories

Resources