PyPDF2.PdfFileWriter addAttachment not working - python

Based on https://programtalk.com/python-examples/PyPDF2.PdfFileWriter/, example 2, I try to to add an attachment into a PDF file.
Here is my code I am trying to run:
import os
from django.conf import settings
from PyPDF2 import PdfFileReader, PdfFileWriter
...
doc = os.path.join(settings.BASE_DIR, "../media/SC/myPDF.pdf")
reader = PdfFileReader(doc, "rb")
writer = PdfFileWriter()
writer.appendPagesFromReader(reader)
writer.addAttachment("The filename to display", "The data in the file")
with open(doc, "wb") as fp:
writer.write(fp)
When I run this code, I get: "TypeError: a bytes-like object is required, not 'str'".
If I replace
with open(doc, 'wb') as fp:
writer.write(fp)
by:
with open(doc, 'wb') as fp:
writer.write(b'fp')
I get this error: "'bytes' object has no attribute 'write'".
And if I try:
with open(doc, 'w') as fp:
writer.write(fp)
I get this error: "write() argument must be str, not bytes"
Can anyone help me?

Second argument in addAttachment has to be a byte-like object. You can do that by encoding the string:
writer.addAttachment("The filename to display", "The data in the file".encode())

Related

Python code to create JSON with Marathi language Giving Unreadable JSON

I am trying to create JSON file using python code. file is created successfully with English language but not properly working with Marathi Language.
Please check out code:
import os
import json
jsonFilePath = "E:/file/"
captchaImgLocation = "E:/file/captchaimg/"
path_to_tesseract = r"C:/Program Files/Tesseract-OCR/tesseract.exe"
image_path = r"E:/file/captchaimg/captcha.png"
x = {
"FName": "प्रवीण",
}
# convert into JSON:
y = json.dumps(x, ensure_ascii=False).encode('utf8')
# the result is a JSON string:
print(y.decode())
completeName = os.path.join(jsonFilePath, "searchResult_Unicode.json")
print(str(completeName))
file1 = open(completeName, "w")
file1.write(str(y))
file1.close()
O/P on console:
{"FName": "प्रवीण"}
<br>
File created inside folder like this:
b'{"FName": "\xe0\xa4\xaa\xe0\xa5\x8d\xe0\xa4\xb0\xe0\xa4\xb5\xe0\xa5\x80\xe0\xa4\xa3"}'
There is no run time or compile time error but JSON is created with with above format.
Please suggest me any solution.
Open the file in the encoding you need and then json.dump to it:
import os
import json
data = { "FName": "प्रवीण" }
# Writing human-readable. Note some text viewers on Windows required UTF-8 w/ BOM
# to *display* correctly. It's not a problem with writing, but you can use
# encoding='utf-8-sig' to hint to those programs that the file is UTF-8 if
# you see that issue. MUST use encoding='utf8' to read it back correctly.
with open('out.json', 'w', encoding='utf8') as f:
json.dump(data, f, ensure_ascii=False)
# Writing non-human-readable for non-ASCII, but others will have few
# problems reading it back into Python because all common encodings are ASCII-compatible.
# Using the default encoding this will work. I'm being explicit about encoding
# because it is good practice.
with open('out2.json', 'w', encoding='ascii') as f:
json.dump(data, f, ensure_ascii=True) # True is the default anyway
# reading either one is the same
with open('out.json', encoding='utf8') as f:
data2 = json.load(f)
with open('out2.json', encoding='utf8') as f: # UTF-8 is ASCII-compatible
data3 = json.load(f)
# Round-tripping test
print(data == data2, data2)
print(data == data3, data3)
Output:
True {'FName': 'प्रवीण'}
True {'FName': 'प्रवीण'}
out.json (UTF-8-encoded):
{"FName": "प्रवीण"}
out2.json (ASCII-encoded):
{"FName": "\u092a\u094d\u0930\u0935\u0940\u0923"}
You have encoded the JSON string, so you must either open the file in binary mode or decode the JSON before writing to file, so:
file1 = open(completeName, "wb")
file1.write(y)
or
file1 = open(completeName, "w")
file1.write(y.decode('utf-8'))
Doing
file1 = open(completeName, "w")
file1.write(str(y))
writes the string representation of the bytes to the file, which always the wrong thing to do.
Do you want your json to be human readable? It's usually bad practice since you would never know what encoding to use.
You can write/read your json files with the json module without worrying about encoding:
import json
json_path = "test.json"
x = {"FName": "प्रवीण"}
with open(json_path, "w") as outfile:
json.dump(x, outfile, indent=4)
with open(json_path, "r") as infile:
print(json.load(infile))

python - the process cannot access the file because it is being used by another

I have the below code in which I am trying to hash an image and set the output in a text file, later I am encrypting the txt file and I need to delete the old version of it. However, I am receiving the following error "The process cannot access the file because it is being used by another process". I have tried to use time.sleep(3) to delay the code but I am still receiving the same error.
or should I be using .close() and how to set it?
can someone please advise?
import os
import hashlib
import logging
import time
import cryptography
from cryptography.fernet import Fernet
key = Fernet.generate_key()
file = open('key.key', 'wb') # Open the file as wb to write bytes
file.write(key) # The key is type bytes still
file.close()
file = open('key.key', 'rb') # Open the file as wb to read bytes
key = file.read() # The key will be type bytes
file.close()
logging.basicConfig(filename='InitializationHash.txt', level=logging.INFO,
format='%(message)s')
def hash_image(filepath):
with open(filepath, 'rb') as f:
file_bytes = f.read()
hash_text = hashlib.sha256(file_bytes).hexdigest()
logging.info(hash_text)
def get_one_image(file_Name):
filepath = os.path.abspath(file_Name)
hash_image(filepath)
if __name__ == '__main__':
get_one_image("initializationlandmark.png")
input_file = 'InitializationHash.txt'
output_file = 'InitializationHash.encrypted'
with open(input_file, 'rb') as f:
data = f.read() # Read the bytes of the input file
fernet = Fernet(key)
encrypted = fernet.encrypt(data)
with open(output_file, 'wb') as f:
f.write(encrypted) # Write the encrypted bytes to the output file
os.remove("InitializationHash.txt")

aws lambda python append to file from S3 object

I am trying to write the contents read from S3 object to a file . I am getting syntax error while doing the same.
object =s3.get_object(Bucket=bucket_name, Key="toollib/{0}/{1}/stages/{0}.groovy".format(tool,platform))
print(object)
jenkinsfile = object['Body'].read()
print(jenkinsfile)
basepath = '/mnt/efs/{0}/{1}/{2}/'.format(orderid, platform, technology)
filename = basepath+fileName
print(filename)
#file1=open(filename, "a")
with open(filename, 'a') as file:
file.write(jenkinsfile)
Error : "errorMessage": "write() argument must be str, not bytes"
Opening the file in binary mode should do the trick:
with open(filename, 'ab') as file:
file.write(jenkinsfile)

Open file from zip without extracting it in Python?

I am working on a script that fetches a zip file from a URL using tje request library. That zip file contains a csv file. I'm trying to read that csv file without saving it. But while parsing it's giving me this error: _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
import csv
import requests
from io import BytesIO, StringIO
from zipfile import ZipFile
response = requests.get(url)
zip_file = ZipFile(BytesIO(response.content))
files = zip_file.namelist()
with zip_file.open(files[0]) as csvfile:
csvreader = csv.reader(csvfile)
# _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
for row in csvreader:
print(row)
Try this:
import pandas as pd
import requests
from io import BytesIO, StringIO
from zipfile import ZipFile
response = requests.get(url)
zip_file = ZipFile(BytesIO(response.content))
files = zip_file.namelist()
with zip_file.open(files[0]) as csvfile:
print(pd.read_csv(csvfile, encoding='utf8', sep=","))
As #Aran-Fey alluded to:
import zipfile
import csv
import io
with open('/path/to/archive.zip', 'r') as f:
with zipfile.ZipFile(f) as zf:
csv_filename = zf.namelist()[0] # see namelist() for the list of files in the archive
with zf.open(csv_filename) as csv_f:
csv_f_as_text = io.TextIOWrapper(csv_f)
reader = csv.reader(csv_f_as_text)
csv.reader (and csv.DictReader) require a file-like object opened in text mode. Normally this is not a problem when simply open(...)ing file in 'r' mode, as the Python 3 docs say, text mode is the default: "The default mode is 'r' (open for reading text, synonym of 'rt')". But if you try rt with open on a ZipFile, you'll see an error that: ZipFile.open() requires mode "r" or "w":
with zf.open(csv_filename, 'rt') as csv_f:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
...
ValueError: open() requires mode "r" or "w"
That's what io.TextIOWrapper is for -- for wrapping byte streams to be readable as text, decoding them on the fly.

get file size of the file from the file object

I need to know the size of the file based on file object
import csv
import os
with open("test.csv", "rb") as infile:
reader = csv.reader(infile)
print reader
filesize(reader)
def filesize(reader):
os.getsize(reader) #And i need work with reader for more details.so I must need to pass a reader or file object
When I run this I got an output is
<_csv.reader object at 0x7f5644584980>
From this file object how I get the size of the file?
And I checked this site but these are not CSV attribute size of an open file object
EDIT: When I use that two inbuilt function I got errors that are
AttributeError: '_csv.reader' object has no attribute 'seek'
AttributeError: '_csv.reader' object has no attribute 'tell'
You can use os.path.getsize or os.stat
import os
os.path.getsize('test.csv')
OR
os.stat('test.csv').st_size
Return the size, in bytes.
Adding this answer because it actually answers the question asked about using the file object directly:
import csv
import os
with open("test.csv", "rb") as infile:
reader = csv.reader(infile)
print reader
infile.seek(0, os.SEEK_END)
filesize = infile.tell()
What's wrong with os.path.getsize?
With your code:
import os
import csv
with open("test.csv", "rb") as infile:
reader = csv.reader(infile)
print os.path.getsize(infile.name)
The size is in bytes.

Categories