My goal is to parse a series of strings into a series of text files that are compressed as a Zip file and downloaded by a web app using Django's HTTP Response.
Developing locally in PyCharm, my method outputs a Zip file called "123.zip" which contains 6 individual files named "123_1", "123_2 etc". containing the letters from my phrase with no problem.
The issue is when I push the code to my web app and include the Django HTTP Response the file will download but when I go to extract it it produces "123.zip.cpzg". Extracting that in turn gives me 123.zip(1) in a frustrating infinite loop. Any suggestions where I'm going wrong?
Code that works locally to produce "123.zip":
def create_text_files1():
JobNumber = "123"
z = zipfile.ZipFile(JobNumber +".zip", mode ="w")
phrase = "A, B, C, D, EF, G"
words = phrase.split(",")
x =0
for word in words:
word.encode(encoding="UTF-8")
x = x + 1
z.writestr(JobNumber +"_" + str(x) + ".txt", word)
z.close()
Additional part of the method in my web app:
response = HTTPResponse(z, content_type ='application/zip')
response['Content-Disposition'] = "attachment; filename='" + str(jobNumber) + "_AHTextFiles.zip'"
Take a closer look at the example provided in this answer.
Notice a StringIO is opened, the zipFile is called with the StringIO as a "File-Like Object", and then, crucially, after the zipFile is closed, the StringIO is returned in the HTTPResponse.
# Open StringIO to grab in-memory ZIP contents
s = StringIO.StringIO()
# The zip compressor
zf = zipfile.ZipFile(s, "w")
# Grab ZIP file from in-memory, make response with correct MIME-type
resp = HttpResponse(s.getvalue(), mimetype = "application/x-zip-co mpressed")
I would recommend a few things in your case.
Use BytesIO for forward compatibility
Take advantage of ZipFile's built in context manager
In your Content-Disposition, be careful of "jobNumber" vs "JobNumber"
Try something like this:
def print_nozzle_txt(request):
JobNumber = "123"
phrase = "A, B, C, D, EF, G"
words = phrase.split(",")
x =0
byteStream = io.BytesIO()
with zipfile.ZipFile(byteStream, mode='w', compression=zipfile.ZIP_DEFLATED,) as zf:
for word in words:
word.encode(encoding="UTF-8")
x = x + 1
zf.writestr(JobNumber + "_" + str(x) + ".txt", word)
response = HttpResponse(byteStream.getvalue(), content_type='application/x-zip-compressed')
response['Content-Disposition'] = "attachment; filename='" + str(JobNumber) + "_AHTextFiles.zip'"
return response
Related
I am trying to download files using python and then add lines at the end of the downloaded files, but it returns an error:
f.write(data + """<auth-user-pass>
TypeError: can't concat str to bytes
Edit: Thanks, it works now when I do this b"""< auth-user-pass >""", but I only want to add the string at the end of the file. When I run the code, it adds the string for every line.
I also tried something like this but it also did not work: f.write(str(data) + "< auth-user-pass >")
here is my full code:
import requests
from multiprocessing.pool import ThreadPool
def download_url(url):
print("downloading: ", url)
# assumes that the last segment after the / represents the file name
# if url is abc/xyz/file.txt, the file name will be file.txt
file_name_start_pos = url.rfind("/") + 1
file_name = url[file_name_start_pos:]
save_path = 'ovpns/'
complete_path = os.path.join(save_path, file_name)
print(complete_path)
r = requests.get(url, stream=True)
if r.status_code == requests.codes.ok:
with open(complete_path, 'wb') as f:
for data in r:
f.write(data + """<auth-user-pass>
username
password
</auth-user-pass>""")
return url
servers = [
"us-ca72.nordvpn.com",
"us-ca73.nordvpn.com"
]
urls = []
for server in servers:
urls.append("https://downloads.nordcdn.com/configs/files/ovpn_legacy/servers/" + server + ".udp1194.ovpn")
# Run 5 multiple threads. Each call will take the next element in urls list
results = ThreadPool(5).imap_unordered(download_url, urls)
for r in results:
print(r)
EDIT: Thanks, it works now when I do this b"""< auth-user-pass >""", but I only want to add the string at the end of the file. When I run the code, it adds the string for every line.
Try this:
import requests
from multiprocessing.pool import ThreadPool
def download_url(url):
print("downloading: ", url)
# assumes that the last segment after the / represents the file name
# if url is abc/xyz/file.txt, the file name will be file.txt
file_name_start_pos = url.rfind("/") + 1
file_name = url[file_name_start_pos:]
save_path = 'ovpns/'
complete_path = os.path.join(save_path, file_name)
print(complete_path)
r = requests.get(url, stream=True)
if r.status_code == requests.codes.ok:
with open(complete_path, 'wb') as f:
for data in r:
f.write(data)
return url
servers = [
"us-ca72.nordvpn.com",
"us-ca73.nordvpn.com"
]
urls = []
for server in servers:
urls.append("https://downloads.nordcdn.com/configs/files/ovpn_legacy/servers/" + server + ".udp1194.ovpn")
# Run 5 multiple threads. Each call will take the next element in urls list
results = ThreadPool(5).imap_unordered(download_url, urls)
with open(complete_path, 'ab') as f:
f.write(b"""<auth-user-pass>
username
password
</auth-user-pass>""")
for r in results:
print(r)
You are using binary mode, encode your string before concat, that is replace
for data in r:
f.write(data + """<auth-user-pass>
username
password
</auth-user-pass>""")
using
for data in r:
f.write(data + """<auth-user-pass>
username
password
</auth-user-pass>""".encode())
You open the file as a write in binary.
Because of that you cant use normal strings like the comment from #user56700 said.
You either need to convert the string or open it another way(ex. 'a' = appending).
Im not completly sure but it is also possible that the write binary variant of open the data of the file deletes. Normally open with write deletes existing data, so its quite possible that you need to change it to 'rwb'.
I trying to figure out a way to return multiple dataframes from a Django view as Zip HttpResponse(zip_file, content_type="application/x-zip-compressed")
I tried this:
import zipfile
import datetime
def makeZipfiles(df_list):
i = 0
with zipfile.ZipFile('some_zip.zip', 'w') as csv_zip:
for dfl in df_list:
csv_zip.writestr(f"file_{str(i)}.csv", dfl.to_csv(index=False))
i = i + 1
return csv_zip
and in the view, I have the following:
zip_file = makeZipfiles(df_list)
response = HttpResponse(zip_file, content_type="application/x-zip-compressed")
return response
but when I try to look at the zip file in the download folder, I get an error that the 'archive is either unknown format or damaged'. The exported file is 1KB size and when I open in notepad I see this content only
"<zipfile.ZipFile [closed]>"
Please advise if what I am trying to do is feasible and if so, please provide a sample code.
Thank you
I haven't tried this out myself yet, but it seems to fit your demand quite well.
https://georepublic.info/en/blog/2019/zip-files-in-django-admin-and-python/
The author describes in detail how to get .csv files from the different dataframes and zip them in the end to one file for download.
His final code was the following:
file_list = [
. . .,
UserInfoResource()
]
def getfiles():
fileSet = {}
date_now = dt.datetime.now().strftime('%Y%m%d%H%M')
for file in file_list:
dataset = file.export()
dataset = dataset.csv
name = file._meta.model.__name__ + date_now
fileSet[name] = dataset
return fileSet
def download_zip(request):
files = getfiles()
zip_filename = 'Survey_Data' + dt.datetime.now().strftime('%Y%m%d%H%M') + '.zip'
zip_buffer = io.BytesIO()
with zipfile.ZipFile(zip_buffer, "a", zipfile.ZIP_DEFLATED, False) as zip_file:
for k, file in files.items():
zip_file.writestr(k + '.csv', file)
zip_buffer.seek(0)
resp = HttpResponse(zip_buffer, content_type='application/zip')
resp['Content-Disposition'] = 'attachment; filename = %s' % zip_filename
return resp
I will update this answer if I manage to apply this on my own website.
(Let us know if you found a better answer in the meantime, thanks ;) )
My django application gets document from user, created some report about it, and write to txt file. The interesting problem is that everything works very well on my Mac OS. But on Windows, it can not read some letters, converts it to symbols like é™, ä±. Here are my codes:
views.py:
def result(request):
last_uploaded = OriginalDocument.objects.latest('id')
original = open(str(last_uploaded.document), 'r')
original_words = original.read().lower().split()
words_count = len(original_words)
open_original = open(str(last_uploaded.document), "r")
read_original = open_original.read()
characters_count = len(read_original)
report_fives = open("static/report_documents/" + str(last_uploaded.student_name) +
"-" + str(last_uploaded.document_title) + "-5.txt", 'w', encoding="utf-8")
# Path to the documents with which original doc is comparing
path = 'static/other_documents/doc*.txt'
files = glob.glob(path)
#endregion
rows, found_count, fives_count, rounded_percentage_five, percentage_for_chart_five, fives_for_report, founded_docs_for_report = search_by_five(last_uploaded, 5, original_words, report_fives, files)
context = {
...
}
return render(request, 'result.html', context)
report txt file:
['universitetindé™', 'té™hsili', 'alä±ram.', 'mé™n'] was found in static/other_documents\doc1.txt.
...
The issue here is that you're calling open() on a file without specifying the encoding. As noted in the Python documentation, the default encoding is platform dependent. That's probably why you're seeing different results in Windows and MacOS.
Assuming that the file itself was actually encoded in UTF-8, just specify that when reading the file:
original = open(str(last_uploaded.document), 'r', encoding="utf-8")
I wrote a script to read PDF metadata to ease a task at work. The current working version is not very usable in the long run:
from pyPdf import PdfFileReader
BASEDIR = ''
PDFFiles = []
def extractor():
output = open('windoutput.txt', 'r+')
for file in PDFFiles:
try:
pdf_toread = PdfFileReader(open(BASEDIR + file, 'r'))
pdf_info = pdf_toread.getDocumentInfo()
#print str(pdf_info) #print full metadata if you want
x = file + "~" + pdf_info['/Title'] + " ~ " + pdf_info['/Subject']
print x
output.write(x + '\n')
except:
x = file + '~' + ' ERROR: Data missing or corrupt'
print x
output.write(x + '\n')
pass
output.close()
if __name__ == "__main__":
extractor()
Currently, as you can see, I have to manually input the working directory and manually populate the list of PDF files. It also just prints out the data in the terminal in a format that I can copy/paste/separate into a spreadsheet.
I'd like the script to work automatically in whichever directory I throw it in and populate a CSV file for easier use. So far:
from pyPdf import PdfFileReader
import csv
import os
def extractor():
basedir = os.getcwd()
extension = '.pdf'
pdffiles = [filter(lambda x: x.endswith('.pdf'), os.listdir(basedir))]
with open('pdfmetadata.csv', 'wb') as csvfile:
for f in pdffiles:
try:
pdf_to_read = PdfFileReader(open(f, 'r'))
pdf_info = pdf_to_read.getDocumentInfo()
title = pdf_info['/Title']
subject = pdf_info['/Subject']
csvfile.writerow([file, title, subject])
print 'Metadata for %s written successfully.' % (f)
except:
print 'ERROR reading file %s.' % (f)
#output.writerow(x + '\n')
pass
if __name__ == "__main__":
extractor()
In its current state it seems to just prints a single error (as in, the error message in the exception, not an error returned by Python) message and then stop. I've been staring at it for a while and I'm not really sure where to go from here. Can anyone point me in the right direction?
writerow([file, title, subject]) should be writerow([f, title, subject])
You can use sys.exc_info() to print the details of your error
http://docs.python.org/2/library/sys.html#sys.exc_info
Did you check the pdffiles variable contains what you think it does? I was getting a list inside a list... so maybe try:
for files in pdffiles:
for f in files:
#do stuff with f
I personally like glob. Notice I add * before the .pdf in the extension variable:
import os
import glob
basedir = os.getcwd()
extension = '*.pdf'
pdffiles = glob.glob(os.path.join(basedir,extension)))
Figured it out. The script I used to download the files was saving the files with '\r\n' trailing after the file name, which I didn't notice until I actually ls'd the directory to see what was up. Thanks for everyone's help.
I am using python script to copy data of a file into other
input_file = open('blind_willie.MP3', 'rb')
contents = input_file.read()
output_file = open('f2.txt', 'wb')
output_file.write(contents)
When I open f2 using text editor I see these kind of symbols:
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿù`‘~ +Pg]Nñòs
Is there a way to see the binary content of the f2 file?
Yes, there is a way to see the binary content of the f2 file, and you have discovered it. Those symbols represent the binary content of the file.
If you'd like to see a human-readable interpretation of the binary content, you'll need something like a hex dump program or hex editor.
On Linux, I use the hd or od -t x1 command.
If you'd like to write your own hex dump command, you might start with one of these:
scapy hexdump()
contents.encode("hex")
http://code.activestate.com/recipes/577243-hex-dump/
http://code.activestate.com/recipes/576945/
Or you could use this code:
def hd(data):
""" str --> hex dump """
def printable(c):
import string
return c in string.printable and not c.isspace()
result = ""
for i in range(0, len(data), 16):
line = data[i:i+16]
result += '{0:05x} '.format(i)
result += ' '.join(c.encode("hex") for c in line)
result += " " * (50-len(line)*3)
result += ''.join(c if printable(c) else '.' for c in line)
result += "\n"
return result
input_file = open('blind_willie.MP3', 'rb')
contents = input_file.read()
output_file = open('f2.txt', 'wb')
output_file.write(hd(contents))