I am uploading files in my Django application, which are recorded on the hard drive, and at a later moment they are retrieved. This works well for most files. However, every once in a while there is a file -- generally a PDF file -- that can't be retrieved properly. This is the error that comes up:
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 194: ordinal not in range(128)
I have seen other questions about this encoding issue, which all relate to how to deal with this when encoding plain text, but I am dealing with binary files. Here is my code:
Relevant upload code:
with open(path, "wb+") as destination:
for chunk in attachment.chunks():
destination.write(chunk)
Code to retrieve the file:
with open(file_path, "rb") as f:
contents = f.read()
response = HttpResponse(contents, content_type="application")
response["Content-Disposition"] = "attachment; filename=\""+name + "\""
I understand there is an encoding issue, but where exactly should I fix this?
Related
so i have variable contains in bytes and i want to save it to str. but how to do it? i tried various kinds of ways, but always got error
'utf-8' codec can't decode byte 0xa0 in position 0: invalid start byte
i want to save the result encrypt file to text file. so i have the proof it the file was encrypt. here;s my ways 1:
def create_file(f):
response = HttpResponse(content_type="text/plain")
response['Content-Disposition'] = 'attachment; filename=file.txt'
filename = f
print(filename)
name_byte = codecs.decode(filename)
print(name_byte)
return response
my ways 2 :
def create_file(enc):
with open("file.txt", "w", encoding="utf-8") as f:
enc = enc.decode('utf-8')
f.write(enc)
my ways 3:
def create_file(f):
file = open("file.txt", "w")
f = f.decode('utf-8')
download = file.write(f)
file.close()
print(download)
return download
f = b'\xa0\x0e\xdc\x14'
f is the return result of encrypt
i called the function :
#in views
download = create_file(enc)
print(download)
#in urls
path("create_file", views.create_file, name="create_file"),
#in html
<a href="{% url 'create_file' %}">
The phrase "have variable contains in bytes and i want to save it to str" makes no sense at all. bytes and str are different data types serving different purposes. bytes holds a byte stream while str hold abstract text. Yes, bytes can also hold textual info encoded in some text encoding like UTF-8, but that is only a specific case. If, for example, your bytes hold a JPEG image then converting it to str does not make any sense simply because it's not text. So if your encrypted file contains a byte stream you need to treat it as such. It's not text any more until you decrypt it.
So the only thing you can do is to save your byte stream as is using the binary file mode:
v = b'\xa0\x0e\xdc\x14'
with open ('/path', 'wb') as fo:
fo.write(v)
The same applies to sending the encrypted file as Django response:
response = HttpResponse(v, content_type="application/octet-stream")
response['Content-Disposition'] = 'attachment; filename=file.bin'
If you just want to write the entire line to file then all you have to do is convert to string.
So this would work:
v = b'\xa0\x0e\xdc\x14'
with open("C:/test/output.txt", 'w') as file:
file.write(str(v))
The result is a text file like this:
I have zip/base64 data string. First, I tried to use base64 decode it. Then used zipfile in Python to unzip this file.
Then I put contents in response as a PDF file. I can see that the PDF I downloaded has 7 pages and length is about 75000. But all pages are blank. I wonder there is something wrong with decoding?
This .decode(errors='ignore') was from another stackoverflow post. Without this statement, I cannot decode the whole thing. It will pop error like "'utf-8' codec can't decode byte 0xfe in position 28".
Here is my code:
decoded = base64.b64decode(data) // data is "zip/base64" type
with zipfile.ZipFile(io.BytesIO(decoded)) as zf:
for name in zf.namelist():
with zf.open(name) as f:
contents = f.read().decode(errors='ignore')
response = HttpResponse(
contents, content_type="application/pdf"
)
response["Content-Disposition"] = 'attachment; filename="{}"'.format(
report_name + ".pdf"
)
return response
You shouldn't try to decode the file contents, since it's not text. f.read() returns a bytestring, which HttpResponse will accept perfectly fine for the page content.
I am trying to open a Windows PE file and alter some strings in the resource section.
f = open('c:\test\file.exe', 'rb')
file = f.read()
if b'A'*10 in file:
s = file.replace(b'A'*10, newstring)
In the resource section I have a string that is just:
AAAAAAAAAA
And I want to replace that with something else. When I read the file I get:
\x00A\x00A\x00A\x00A\x00A\x00A\x00A\x00A\x00A\x00A
I have tried opening with UTF-16 and decoding as UTF-16 but then I run into a error:
UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 1604-1605: illegal encoding
Everyone I seen who had the same issue fixed by decoding to UTF-16. I am not sure why this doesn't work for me.
If resource inside binary file is encoded to utf-16, you shouldn't change encoding.
try this
f = open('c:\\test\\file.exe', 'rb')
file = f.read()
unicode_str = u'AAAAAAAAAA'
encoded_str = unicode_str.encode('UTF-16')
if encoded_str in file:
s = file.replace(encoded_str, new_utf_string.encode('UTF-16'))
inside binary file everything is encoded, keep in mind
I created a zip file like so:
zipped_file = zipfile.ZipFile("csvOutput{}.zip".format(timestamp), 'w')
zipped_file.write(sms_file_path)
zipped_file.write(mail_maga_file_path)
zipped_file.close()
And want to send it, which I am currently using this code:
response_file = open('csvOutput{}.zip'.format(timestamp))
response = HttpResponse(response_file, content_type="application/force-download")
response['Content-Disposition'] = 'attachment; filename=csvOutput{}.zip"'.format(timestamp)
return response
raise Http404
But maybe something I am using is out of date? Python keeps crashing with a byte it can't decode in unicode:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 14: invalid start byte
Says line 100 is throwing the error, which would be the line with HttpResponse()
Edit: I've changed the content_type to application/zip and am getting a new error (that seems better?):
caution: zipfile comment truncated
error [output.zip]: missing 3232109644 bytes in zipfile
(attempting to process anyway)
error [output.zip]: attempt to seek before beginning of zipfile
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
Looks like this issue is with your time stamp string.
response_file = open('csvOutput{}.zip'.format(str(timestamp).encode('utf-8').strip()))
or
response_file = open('csvOutput{}.zip'.format(str(timestamp).encode('ISO-8859-1').strip()))
I've managed to fix the issue, by opening the file with 'rb' as in:
response_file = open('csvOutput{}.zip'.format(timestamp), 'rb')
This is my code.
stock_code = open('/home/ubuntu/trading/456.csv', 'r')
csvReader = csv.reader(stock_code)
for st in csvReader:
eventcode = st[1]
print(eventcode)
I want to know content in excel.
But there are unicodeDecodeError.
How can i fix it?
The CSV docs say,
Since open() is used to open a CSV file for reading, the file will by default be decoded into unicode using the system default encoding...
The error message shows that your system is expecting the file to be using UTF-8 encoding.
Solutions:
Make sure the file is using the correct encoding.
For example, open the file using NotePad++, select Encoding from the menu
and select UTF-8. Then resave the file.
Alternatively, specify the encoding of the file when calling open(), like this
my_encoding = 'UTF-8' # or whatever is the encoding of the file.
with open('/home/ubuntu/trading/456.csv', 'r', encoding=my_encoding) as stock_code:
stock_code = open('/home/ubuntu/trading/456.csv', 'r')
csvReader = csv.reader(stock_code)
for st in csvReader:
eventcode = st[1]
print(eventcode)