Python3 Django ZipFile HttpResponse UnicodeDecodeError - python

I created a zip file like so:
zipped_file = zipfile.ZipFile("csvOutput{}.zip".format(timestamp), 'w')
zipped_file.write(sms_file_path)
zipped_file.write(mail_maga_file_path)
zipped_file.close()
And want to send it, which I am currently using this code:
response_file = open('csvOutput{}.zip'.format(timestamp))
response = HttpResponse(response_file, content_type="application/force-download")
response['Content-Disposition'] = 'attachment; filename=csvOutput{}.zip"'.format(timestamp)
return response
raise Http404
But maybe something I am using is out of date? Python keeps crashing with a byte it can't decode in unicode:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x94 in position 14: invalid start byte
Says line 100 is throwing the error, which would be the line with HttpResponse()
Edit: I've changed the content_type to application/zip and am getting a new error (that seems better?):
caution: zipfile comment truncated
error [output.zip]: missing 3232109644 bytes in zipfile
(attempting to process anyway)
error [output.zip]: attempt to seek before beginning of zipfile
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)

Looks like this issue is with your time stamp string.
response_file = open('csvOutput{}.zip'.format(str(timestamp).encode('utf-8').strip()))
or
response_file = open('csvOutput{}.zip'.format(str(timestamp).encode('ISO-8859-1').strip()))

I've managed to fix the issue, by opening the file with 'rb' as in:
response_file = open('csvOutput{}.zip'.format(timestamp), 'rb')

Related

Decode zip/base64 file as a PDF file in response in Python (Django), why my PDF is blank?

I have zip/base64 data string. First, I tried to use base64 decode it. Then used zipfile in Python to unzip this file.
Then I put contents in response as a PDF file. I can see that the PDF I downloaded has 7 pages and length is about 75000. But all pages are blank. I wonder there is something wrong with decoding?
This .decode(errors='ignore') was from another stackoverflow post. Without this statement, I cannot decode the whole thing. It will pop error like "'utf-8' codec can't decode byte 0xfe in position 28".
Here is my code:
decoded = base64.b64decode(data) // data is "zip/base64" type
with zipfile.ZipFile(io.BytesIO(decoded)) as zf:
for name in zf.namelist():
with zf.open(name) as f:
contents = f.read().decode(errors='ignore')
response = HttpResponse(
contents, content_type="application/pdf"
)
response["Content-Disposition"] = 'attachment; filename="{}"'.format(
report_name + ".pdf"
)
return response
You shouldn't try to decode the file contents, since it's not text. f.read() returns a bytestring, which HttpResponse will accept perfectly fine for the page content.

UTF-8 Codec Error When Assigning Uncompressed GZip File from URL to String Variable

I am downloading a gzip log from a URL and then saving it to a variable. I then want to later iterate over that string variable line by line. If I just save the file and open it in Notepad++, I can see that the saved log file is in UTF-8 encoding.
I wanted to skip saving the file and then reopening to parse it, so I have attempted to assign the file contents to a variable and then use io.StringIO to iterate over each line within the variable. This process works fine but occasionally I get the following error to blow up when the script reaches the line return str(file_content, 'utf-8').
Exception Raised in connect function: 'utf-8' codec can't decode byte 0xe0 in position 138037: invalid continuation byte
Here is the section of code that makes the request and then assigns to string variable.
# Making a get request with basic authentication
request = urllib.request.Request(url)
base64string = base64.b64encode(bytes('%s:%s' % ('xxxxx', 'xxxxx'),'ascii'))
request.add_header("Authorization", "Basic %s" % base64string.decode('utf-8'))
# open request and then use gzip to read the shoutcast log that is in gzip format, then save uncompressed version
with urllib.request.urlopen(request) as response:
with gzip.GzipFile(fileobj=response) as uncompressed:
file_content = uncompressed.read()
return str(file_content, 'utf-8')

Can't reopen Django File as rb

Why does reopening a django.core.files File as binary not work?
from django.core.files import File
f = open('/home/user/test.zip')
test_file = File(f)
test_file.open(mode="rb")
test_file.read()
This gives me the error 'utf-8' codec can't decode byte 0x82 in position 14: invalid start byte so opening in 'rb' obviously didn't work. The reason I need this is because I want to open a FileField as binary
You need to open(…) [Python-doc] the underlying file handler in binary mode, so:
with open('/home/user/test.zip', mode='rb') as f:
test_file = File(f)
test_file.open(mode='rb')
test_file.read()
Without opening it in binary mode, the underlying reader will try to read this as text, and thus error on bytes that are not a utf-8 code points.

Python reading a PE file and changing resource section

I am trying to open a Windows PE file and alter some strings in the resource section.
f = open('c:\test\file.exe', 'rb')
file = f.read()
if b'A'*10 in file:
s = file.replace(b'A'*10, newstring)
In the resource section I have a string that is just:
AAAAAAAAAA
And I want to replace that with something else. When I read the file I get:
\x00A\x00A\x00A\x00A\x00A\x00A\x00A\x00A\x00A\x00A
I have tried opening with UTF-16 and decoding as UTF-16 but then I run into a error:
UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 1604-1605: illegal encoding
Everyone I seen who had the same issue fixed by decoding to UTF-16. I am not sure why this doesn't work for me.
If resource inside binary file is encoded to utf-16, you shouldn't change encoding.
try this
f = open('c:\\test\\file.exe', 'rb')
file = f.read()
unicode_str = u'AAAAAAAAAA'
encoded_str = unicode_str.encode('UTF-16')
if encoded_str in file:
s = file.replace(encoded_str, new_utf_string.encode('UTF-16'))
inside binary file everything is encoded, keep in mind

'utf-8' codec can't decode byte 0x97 in position 14: invalid start byte

I want to download one directory from my server on button click. The directory should be downloaded in zip format. I am using Django and Python. I tried this earlier with the same code but it was on Python2 venv. The same code on Python3 venv gives utf-8 codec can't decode byte error. The zip of the directory is created successfully but when i press the download button on my website it throws me above error.
#login_required
def logs_folder_index(request):
user = request.user
if not is_moderator(user):
raise Http404("You are not allowed to see this page.")
else:
if os.path.exists('Experiments.zip'):
os.remove('Experiments.zip')
zipf = zipfile.ZipFile('Experiments.zip','w',zipfile.ZIP_DEFLATED)
path = settings.BASE_DIR + '/experiments/'
zipdir(path,zipf)
zipf.close()
zip_file = open('Experiments.zip','r')
response = HttpResponse(zip_file,
content_type='application/force-download')
response['Content-Disposition'] = 'attachment; filename="{0}"'\
.format(Experiments.zip)
return response
Can someone please help me with this problem.
Your read the file as a text stream (since the mode is 'r', and not 'rb'). Since zips are typically not encoded in UTF-8 (or any text codec in general), it is likely to eventually reach a byte sequence that can not be decoded (or will be decoded non-sensical), you thus should read it as a binary file:
#login_required
def logs_folder_index(request):
user = request.user
if not is_moderator(user):
raise Http404("You are not allowed to see this page.")
elif os.path.exists('Experiments.zip'):
os.remove('Experiments.zip')
with zipfile.ZipFile('Experiments.zip','w',zipfile.ZIP_DEFLATED) as zipf:
path = settings.BASE_DIR + '/experiments/'
zipdir(path,zipf)
with open('Experiments.zip','rb') as stream:
response = HttpResponse(stream, content_type='application/force-download')
response['Content-Disposition'] = 'attachment; filename="Experiments.zip"'
return response

Categories