How to compress Excel buffer into ZIP buffer? - python

I'm trying to compress a Excel BytesIO stream into a ZIP BytesIO stream but ValueError: stat: embedded null character in path is called when I use write(). I used pyzipper and pyminizip but neither worked well.
def __compress_excel__(self, excel_buffer):
zip_buffer = BytesIO()
password = b'password'
zip_buffer = pyzipper.AESZipFile(zip_stream, mode='w')
zip_buffer.setpassword(password)
zip_buffer.write(excel_buffer.getvalue())
return zip_buffer.getvalue()
I want to avoid using a temp file to do this. Regards.
UPDATE
Now thanks to martineau comments, Excel could be compressed now, but I have a new problem is that setpassword() from ZipFile doesn't work. A ZIP is created but when in uncompress it, no password is required.
def __compress_excel__(self, excel_buffer):
zip_stream = BytesIO()
password = b'password'
with ZipFile(zip_stream , mode='w') as zipf:
zipf.setpassword(password)
zipf.writestr('excel.xlsx', excel_buffer.getvalue())
return zip_stream.getvalue()
Regards

Related

I do not want to write and read the same document in python

I have pdf files where I want to extract info only from the first page. My solution is to:
Use PyPDF2 to read from S3 and save only the first page.
Read the same one-paged-pdf I saved, convert to byte64 and analyse it on AWS Textract.
It works but I do not like this solution. What is the need to save and still read the exact same file? Can I not use the file directly at runtime?
Here is what I have done that I don't like:
from PyPDF2 import PdfReader, PdfWriter
from io import BytesIO
import boto3
def analyse_first_page(bucket_name, file_name):
s3 = boto3.resource("s3")
obj = s3.Object(bucket_name, file_name)
fs = obj.get()['Body'].read()
pdf = PdfReader(BytesIO(fs), strict=False)
writer = PdfWriter()
page = pdf.pages[0]
writer.add_page(page)
# Here is the part I do not like
with open("first_page.pdf", "wb") as output:
writer.write(output)
with open("first_page.pdf", "rb") as pdf_file:
encoded_string = bytearray(pdf_file.read())
#Analyse text
textract = boto3.client('textract')
response = textract.detect_document_text(Document={"Bytes": encoded_string})
return response
analyse_first_page(bucket, file_name)
Is there no AWS way to do this? Is there no better way to do this?
You can use BytesIO as stream in memory without write to file then read it again.
with BytesIO() as bytes_stream:
writer.write(bytes_stream)
bytes_stream.seek(0)
encoded_string = b64encode(bytes_stream.getvalue())

Create a password protected zip file in-memory and write to disk

from io import BytesIO
import zipfile
mem_zip = BytesIO()
with zipfile.ZipFile(mem_zip, mode="w",compression=zipfile.ZIP_DEFLATED) as zf:
zf.writestr("filename.txt", b"test")
data = mem_zip.getvalue()
with open('/path/to/file/test.zip', 'wb') as f:
f.write(data)
The code sinppet above creates a zip file in-memory with and writes to disk as a zip file. This works as expected. I can extract it and see the text file with the content "test".
I wish to password protect my zipfile. How do I do that?
I tried using the setpassword method but that had no effect on the output. The file written to disk was not password protected.
with zipfile.ZipFile(mem_zip, mode="w",compression=zipfile.ZIP_DEFLATED) as zf:
zf.setpassword(b"test_password")
zf.writestr("filename.txt", b"test")
I am writing to disk here just to test if the zipfile looks as I expect. My goal is to send the file as an email attachment and I wish to keep the zip file in-memory. So using pyminizip is not an option for me.

Why doesn't the password open my zip file in s3 when passed as a bytes object in python?

I have a small but mysterious and unsolvable problem using python to open a password protected file in an AWS S3 bucket.
The password I have been given is definitely correct and I can download the zip to Windows and extract it to reveal the csv data I need.
However I need to code up a process to load this data into a database regularly.
The password has a pattern like this (includes mixed case letters, numbers and a single "#"):-
ABCD#Efghi12324567890
The code below works with other zip files I place in the location with the same password:-
import boto3
import pyzipper
from io import BytesIO
s3_resource = boto3.resource('s3', aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key)
zip_obj = s3_resource.Object(bucket_name=my_bucket, key=my_folder + my_zip)
buffer = BytesIO(zip_obj.get()["Body"].read())
z = pyzipper.ZipFile(buffer)
my_newfile=z.namelist()[0]
s3_resource.meta.client.upload_fileobj(
z.open(my_newfile, pwd=b"ABCD#Efghi12324567890"), #HERE IS THE OPEN COMMAND
Bucket=my_bucket,
Key=my_folder + my_newfile)
I am told the password is incorrect:-
RuntimeError: Bad password for file 'ThisIsTheFileName.csv'
I resorted to using pyzipper rather than zipfile, since zipfile didn't support the compression method of the file in question:-
That compression method is not supported
In 7-zip I can see the following for the zip file:-
Method: AES-256 Deflate
Characteristics: WzAES: Encrypt
Host OS: FAT
So to confirm:-
-The password is definitely correct (can open it manually)
-The code seems ok - it opens my zip files with the same password
What is the issue here please and how do I fix it?
You would have my sincere thanks!
Phil
With some help from a colleague and a useful article, I now have this working.
Firstly as per the compression type, I have found it necessary to use the AESZipFile() method of pyzipper (although this method also seemed to work on other compression types).
Secondly the AESZipFile() method apparently accepts a BytesIO object as well as a file path, presumably because this is what it sees when it opens the file.
Therefore the zip file can be extracted in situ without having to download it first.
This method creates the pyzipper object which you can then read by specifying the file name and the password.
The final code looks like this:-
import pyzipper
import boto3
from io import BytesIO
my_bucket = ''
my_folder = ''
my_zip = ''
my_password = b''
aws_access_key_id=''
aws_secret_access_key=''
s3 = boto3.client('s3', aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key)
s3_file = s3.get_object(Bucket=my_bucket, Key=my_folder + my_zip)
s3_iodata = BytesIO(s3_file['Body'].read())
f = pyzipper.AESZipFile(s3_iodata)
my_file = f.namelist()[0]
file_content = f.read(my_file, pwd = my_password)
response = s3.put_object(Body=file_content, Bucket=my_bucket, Key=my_folder + my_file)
Here is an article that was useful:-
https://www.linkedin.com/pulse/extract-files-from-zip-archives-in-situ-aws-s3-using-python-tom-reid
I hope this is helpful to someone,
Phil

How to convert a base64 byte to image in python

I am facing a problem in converting base64byte to image.I tried this
filename="chart.png"
with open(filename, "wb") as f:
f.write(base64.decodebytes(data))
where the base64byte is sored in data (like this b'iV.....'.after this i tried uploading the "filename" to my django database by doing
gimage.objects.create(name='Rufus5', pic1=filename)
.this creates an empty file chart.png in my django database.
you can use django's default functionality
from django.core.files.images import ImageFile
gimage.objects.create(name='Rufus5', pic1=ImageFile(open("chart.png", "rb")))
Try this:
import base64
imgdata = base64.b64decode(imgstring)
filename = 'some_image.jpg' # I assume you have a way of picking unique filenames
with open(filename, 'wb') as f:
f.write(imgdata)
# f gets closed when you exit the with statement
# Now save the value of filename to your database
the anser is founded in : How to convert base64 string to image?

How to add a temporary .docx file to a zip archive in django

here is my code for downloading a zip file, containing a .docx file,
def reportsdlserien(request):
selected_sem = request.POST.get("semester","SS 2016")
docx_title="Report_in_%s.docx" % selected_sem.replace(' ','_')
document = Document()
f = io.BytesIO()
zip_title="Archive_in_%s.zip" % selected_sem.replace(' ','_')
zip_arch = ZipFile( f, 'a' )
document.add_heading("Report in "+selected_sem, 0)
document.add_paragraph(date.today().strftime('%d %B %Y'))
document.save(docx_title)
zip_arch.write(docx_title)
zip_arch.close()
response = HttpResponse(
f.getvalue(),
content_type='application/zip'
)
response['Content-Disposition'] = 'attachment; filename=' + zip_title
return response
the only problem is, it also creates the .docx file, which i dont need. I wanted to use BytesIO for a docx file too, but i cant add it to the archive, the command zip_arch.write(BytesIOdocxfile) doesn't work. Is there another command to do this?
Thank you!
Use the writestr() function to add some bytes to the archive:
data = StringIO()
document.save(data) # Or however the library requires you to do this.
zip_arch.writestr(docx_title, bytes(data.getvalue()))
I've only done this with StringIO, but I don't see why BytesIO would't work just as well.

Categories