Can't reopen Django File as rb - python

Why does reopening a django.core.files File as binary not work?
from django.core.files import File
f = open('/home/user/test.zip')
test_file = File(f)
test_file.open(mode="rb")
test_file.read()
This gives me the error 'utf-8' codec can't decode byte 0x82 in position 14: invalid start byte so opening in 'rb' obviously didn't work. The reason I need this is because I want to open a FileField as binary

You need to open(…) [Python-doc] the underlying file handler in binary mode, so:
with open('/home/user/test.zip', mode='rb') as f:
test_file = File(f)
test_file.open(mode='rb')
test_file.read()
Without opening it in binary mode, the underlying reader will try to read this as text, and thus error on bytes that are not a utf-8 code points.

Related

Read file from sys.stdin in 'rb' mode : Python

I have written below code to convert a csv file to a xml file. I am reading the file from sys.stdin and writing the output back to sys.stdout. I am getting below error while reading a file.
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 7652: invalid continuation byte
I have researched the error and found that reading the input file in 'rb' mode may resolve the error. Now how do I change the below code to read the input file from sys.stdin in 'rb' mode. I could not find answer yet.
import csv
import sys
import os
from xml.dom.minidom import Document
filename = sys.argv[1]
filename = os.path.splitext(filename)[0]+'.xml'
pathname = "/tmp/"
output_file = pathname + filename
f = sys.stdin
reader = csv.reader(f)
fields = next(reader)
fields = [x.lower() for x in fields]
fieldsR = fields
doc = Document()
dataRoot = doc.createElement("rowset")
dataRoot.setAttribute('xmlns:xsi', "http://www.w3.org/2001/XMLSchema-instance")
dataRoot.setAttribute('xsi:schemaLocation', "./schema.xsd")
doc.appendChild(dataRoot)
for line in reader:
dataElt = doc.createElement("row")
for i in range(len(fieldsR)):
dataElt.setAttribute(fieldsR[i], line[i])
dataRoot.appendChild(dataElt)
xmlFile = open(output_file,'w')
xmlFile.write(doc.toprettyxml(indent = '\t'))
xmlFile.close()
sys.stdout.write(output_file)
In Python 3, both stdin, stout and stderr are wrapped in IO buffers that apply on the fly text encoding/decoding to the streams.
If you want direct access to the underlying binary stream, it is available as attributes in these wrappers.
For stdin, instead of calling .read in sys.stdin do sys.stdin.buffer.raw.read() -
(and likewise for stderr and stdout, just use ...buffer.raw to get to the underlying binary stream).

Python reading a PE file and changing resource section

I am trying to open a Windows PE file and alter some strings in the resource section.
f = open('c:\test\file.exe', 'rb')
file = f.read()
if b'A'*10 in file:
s = file.replace(b'A'*10, newstring)
In the resource section I have a string that is just:
AAAAAAAAAA
And I want to replace that with something else. When I read the file I get:
\x00A\x00A\x00A\x00A\x00A\x00A\x00A\x00A\x00A\x00A
I have tried opening with UTF-16 and decoding as UTF-16 but then I run into a error:
UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 1604-1605: illegal encoding
Everyone I seen who had the same issue fixed by decoding to UTF-16. I am not sure why this doesn't work for me.
If resource inside binary file is encoded to utf-16, you shouldn't change encoding.
try this
f = open('c:\\test\\file.exe', 'rb')
file = f.read()
unicode_str = u'AAAAAAAAAA'
encoded_str = unicode_str.encode('UTF-16')
if encoded_str in file:
s = file.replace(encoded_str, new_utf_string.encode('UTF-16'))
inside binary file everything is encoded, keep in mind

Python3 Django ZipFile HttpResponse UnicodeDecodeError

I created a zip file like so:
zipped_file = zipfile.ZipFile("csvOutput{}.zip".format(timestamp), 'w')
zipped_file.write(sms_file_path)
zipped_file.write(mail_maga_file_path)
zipped_file.close()
And want to send it, which I am currently using this code:
response_file = open('csvOutput{}.zip'.format(timestamp))
response = HttpResponse(response_file, content_type="application/force-download")
response['Content-Disposition'] = 'attachment; filename=csvOutput{}.zip"'.format(timestamp)
return response
raise Http404
But maybe something I am using is out of date? Python keeps crashing with a byte it can't decode in unicode:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 14: invalid start byte
Says line 100 is throwing the error, which would be the line with HttpResponse()
Edit: I've changed the content_type to application/zip and am getting a new error (that seems better?):
caution: zipfile comment truncated
error [output.zip]: missing 3232109644 bytes in zipfile
(attempting to process anyway)
error [output.zip]: attempt to seek before beginning of zipfile
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
Looks like this issue is with your time stamp string.
response_file = open('csvOutput{}.zip'.format(str(timestamp).encode('utf-8').strip()))
or
response_file = open('csvOutput{}.zip'.format(str(timestamp).encode('ISO-8859-1').strip()))
I've managed to fix the issue, by opening the file with 'rb' as in:
response_file = open('csvOutput{}.zip'.format(timestamp), 'rb')

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 0: invalid start byte

This is my code.
stock_code = open('/home/ubuntu/trading/456.csv', 'r')
csvReader = csv.reader(stock_code)
for st in csvReader:
eventcode = st[1]
print(eventcode)
I want to know content in excel.
But there are unicodeDecodeError.
How can i fix it?
The CSV docs say,
Since open() is used to open a CSV file for reading, the file will by default be decoded into unicode using the system default encoding...
The error message shows that your system is expecting the file to be using UTF-8 encoding.
Solutions:
Make sure the file is using the correct encoding.
For example, open the file using NotePad++, select Encoding from the menu
and select UTF-8. Then resave the file.
Alternatively, specify the encoding of the file when calling open(), like this
my_encoding = 'UTF-8' # or whatever is the encoding of the file.
with open('/home/ubuntu/trading/456.csv', 'r', encoding=my_encoding) as stock_code:
stock_code = open('/home/ubuntu/trading/456.csv', 'r')
csvReader = csv.reader(stock_code)
for st in csvReader:
eventcode = st[1]
print(eventcode)

Scipy: UnicodeDecodeError while loading image file

def image_to_laplacian(filename):
with open(filename, 'r', encoding="latin-1") as f:
s = f.read()
img = sc.misc.imread(f)
image_to_laplacian('images/bw_3x3.png')
Produces:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte
"images/bw_3x3.png" is a 3x3 image I produced in Pinta. I tried opening a cat.jpg I got from Google Images, but I got the same error.
I also tried to use "encoding="latin-1" as an argument to open, based on something I read on SO; I was able to open the file, but I'm
read failed with the exception
OSError: cannot identify image file <_io.TextIOWrapper name='images/bw_3x3.png' mode='r' encoding='latin-1'>
The line that is causing the error is
s = f.read()
In 'r' mode this tries to read the data as a string, but it's a image file, so it will fail. You can use 'rb' instead. Definitely remove the encoding=latin because that's only relevant for text files.
Also, note that according to the documentation:
name : str or file object
The file name or file object to be read.
So you can dispense with opening a file and just give it a filepath as a string. The following should work:
img = sc.misc.imread(filename)

Categories