Using Python and openpyxl I have a code that goes into a workbook, adds a bunch of information to some cells every day, and then closes and saves it. My problem is that if this workbook is open the code doesn't work (obviously).
By making the workbook shared(multiple users can edit at once) I can overcome this problem, however after the code runs once, python saves and then reverts it back to a closed, unshared workbook.
Anyone know if openpyxl can save as shared? I'm not finding anything online.
Pre-emptive thanks for your help.
It seems that when openpyxl saves an Excel workbook, the docProps/app.xml file inside is wiped and contains only minimal information.
A quick (and dirty) solution is to use zipfile to get these information and transfer them into the new/saved file.
import zipfile, openpyxl
def shared_pyxl_save(file_path, workbook):
"""
`file_path`: path to the shared file you want to save
`workbook`: the object returned by openpyxl.load_workbook()
"""
zin = zipfile.ZipFile(file_path, 'r')
buffers = []
for item in zin.infolist():
buffers.append((item, zin.read(item.filename)))
zin.close()
workbook.save(file_path)
zout = zipfile.ZipFile(file_path, 'w')
for item, buffer in buffers:
zout.writestr(item, buffer)
zout.close()
here's my answer post on another page, but this page is the original
well... after playing with it back and forth, for some weird reason zipfile.infolist() does contains the sheet data as well, so here's my way to fine tune it, using the shared_pyxl_save example the previous gentleman provided
basically instead of letting the old file overriding the sheet's data, use the old one
def shared_pyxl_save(file_path, workbook):
"""
`file_path`: path to the shared file you want to save
`workbook`: the object returned by openpyxl.load_workbook()
"""
zin = zipfile.ZipFile(file_path, 'r')
buffers = []
for item in zin.infolist():
if "sheet1.xml" not in item.filename:
buffers.append((item, zin.read(item.filename)))
zin.close()
workbook.save(file_path)
""" loop through again to find the sheet1.xmls and put it into buffer, else will show up error"""
zin2 = zipfile.ZipFile(file_path, 'r')
for item in zin2.infolist():
if "sheet1.xml" in item.filename:
buffers.append((item, zin2.read(item.filename)))
zin2.close()
#finally saves the file
zout = zipfile.ZipFile(file_path, 'w')
for item, buffer in buffers:
zout.writestr(item, buffer)
zout.close()
workbook.close()
Related
I have a django blog and want to download a backup zipfile with all the entries. The blog post text content is stored in the database.
I have written this code with the goal of trying to get the zipfile to save a bunch of .txt files in the main zip directory, but all this code does is outputs a single corrupted zip file. It cannot be unzipped but for some reason it can be opened in Word and it shows all of the blog post text mashed up.
def download_backups(request):
zip_filename = "test.zip"
s = BytesIO()
zf = zipfile.ZipFile(s, "w")
blogposts = Blog.objects.all()
for blogpost in blogposts:
filename = blogpost.title + ".txt"
zf.writestr(filename, blogpost.content)
resp = HttpResponse(s.getvalue())
resp['Content-Disposition'] = 'attachment; filename=%s' % zip_filename
return resp
Any help is appreciated.
Based on this answer to another question, you may be having an issue with the read mode. You'll also need to call zf.close(), either explicitly or implicitly, before the file will actually be complete.
I think there's a simpler way of handling this using a temporary file, which should have the advantage of not needing to fit all of the file's contents in memory.
from tempfile import TemporaryFile
from zipfile import ZipFile
with TemporaryFile() as tf:
with ZipFile(tf, mode="w") as zf:
zf.writestr("file1.txt", "The first file")
zf.writestr("file2.txt", "A second file")
tf.seek(0)
print(tf.read())
The with blocks here will result in your temp file going out of scope and being deleted, and zf.close being called implicitly before you attempt to read the file.
If the goal here is just to back up the data rather than using this specific format, though, I'd suggest using the built-in dumpdata management command. You can call it from code if you want to serve the results through a view like this.
I'm not sure how to word my question exactly, and I have seen some similar questions asked but not exactly what I'm trying to do. If there already is a solution please direct me to it.
Here is what I'm trying to do:
At my work, we have a few pkgs we've built to handle various data types. One I am working with is reading in a csv file into a std_io object (std_io is our all-purpose object class that reads in any type of data file).
I am trying to connect this to another pkg I am writing, so I can make an object in the new pkg, and covert it to a std_io object.
The problem is, the std_io object is meant to read an actual file, not take in an object. To get around this, I can basically write my data to temp.csv file then read it into a std_io object.
I am wondering if there is a way to eliminate this step of writing the temp.csv file.
Here is my code:
x #my object
df = x.to_df() #object class method to convert to a pandas dataframe
df.to_csv('temp.csv') #write data to a csv file
std_io_obj = std_read('temp.csv') #read csv file into a std_io object
Is there a way to basically pass what the output of writing the csv file would be directly into std_read? Does this make sense?
The only reason I want to do this is to avoid having to code additional functionality into either of the pkgs to directly accept an object as input.
Hope this was clear, and thanks to anyone who contributes.
For those interested, or who may have this same kind of issue/objective, here's what I did to solve this problem.
I basically just created a temporary named file, linked a .csv filename to this temp file, then passed it into my std_read function which requires a csv filename as an input.
This basically tricks the function into thinking it's taking the name of a real file as an input, and it just opens it as usual and uses csvreader to parse it up.
This is the code:
import tempfile
import os
x #my object I want to convert to a std_io object
text = x.to_df().to_csv() #object class method to convert to a pandas dataframe then generate the 'text' of a csv file
filename = 'temp.csv'
with tempfile.NamedTemporaryFile(dir = os.path.dirname('.')) as f:
f.write(text.encode())
os.link(f.name, filename)
stdio_obj = std_read(filename)
os.unlink(filename)
del f
FYI - the std_read function essentially just opens the file the usual way, and passes it into csvreader:
with open(filename, 'r') as f:
rdr = csv.reader(f)
My script generates PDF (PyPDF2.pdf.PdfFileWriter object) and stores it in the variable.
I need to work with it as file-like object further in script. But now I have to write it to HDD first. Then I have to open it as file to work with it.
To prevent this unnecessary writing/reading operations I found many solutions - StringIO, BytesIO and so on. But I cannot find what exactly can help me in my case.
As far as I understand - I need to "convert" (or write to RAM) PyPDF2.pdf.PdfFileWriter object to file-like object to work directly with it.
Or there is another method that fits exactly to my case?
UPDATE - here is code-sample
from pdfrw import PdfReader, PdfWriter, PageMerge
from PyPDF2 import PdfFileReader, PdfFileWriter
red_file = PdfFileReader(open("file_name.pdf", 'rb'))
large_pages_indexes = [1, 7, 9]
large = PdfFileWriter()
for i in large_pages_indexes:
p = red_file.getPage(i)
large.addPage(p)
# here final data have to be written (I would like to avoid that)
with open("virtual_file.pdf", 'wb') as tmp:
large.write(tmp)
# here I need to read exported "virtual_file.pdf" (I would like to avoid that too)
with open("virtual_file.pdf", 'rb') as tmp:
pdf = PdfReader(tmp) # here I'm starting to work with this file using another module "pdfrw"
print(pdf)
To avoid slow disk I/O it appears you want to replace
with open("virtual_file.pdf", 'wb') as tmp:
large.write(tmp)
with open("virtual_file.pdf", 'rb') as tmp:
pdf = PdfReader(tmp)
with
buf = io.BytesIO()
large.write(buf)
buf.seek(0)
pdf = PdfReader(buf)
Also, buf.getvalue() is available to you.
I am trying to create a basic mathematical quiz and need to be able to store the name of the user next to their score. To ensure that I could edit the data dynamically regardless of the length of the user's name or the number of digits in their score, I decided to split up the name and score with a comma and use the split function. I'm new to file handling in python so don't know if I am using the wrong mode ("r+") but when I complete the quiz, my score is not recorded at all, nothing is added to the file. Here is my code:
for line in class_results.read():
if student_full_name in line:
student = line.split(",")
student[1] = correct
line.replace(line, "{},{}".format(student_full_name, student[1]))
else:
class_results.write("{},{}".format(student_full_name, correct))
Please let me know how I can get this system to work. Thank you in advance.
Yes r+ opens the file for both reading and writing and to summarize:
r when the file will only be read
w for only writing (an existing file with the same name will be erased)
a opens the file for appending; any data written to the file is automatically added to the end.
I will recommend instead of comma separation to benifit from json or yaml syntax, it fits better in this case.
scores.json:
{
"student1": 12,
"student2": 798
}
The solution:
import json
with open(filename, "r+") as data:
scores_dict = json.loads(data.read())
scores_dict[student_full_name] = correct # if already exist it will be updated otherwise it will be added
data.seek(0)
data.write(json.dumps(scores_dict))
data.truncate()
scores.yml will looks as follow:
student1: 45
student2: 7986
Solution:
import yaml
with open(filename, "r+") as data:
scores_dict = yaml.loads(data.read())
scores_dict[student_full_name] = correct # if already exist it will be updated otherwise it will be added
data.seek(0)
data.write(yaml.dump(scores_dict, default_flow_style=False))
data.truncate()
to instal yaml python package: pip install pyyaml
Modifying a file in place is generally a poor way to do this. It risks errors causing the resulting file to be half new data, half old, with the split point being corrupted. The usual pattern is to write to a new file, then atomically replace the old file with the new file, so either you have the entire original old file and a partial new file, or the new file, not a mish-mash of both.
Given your example code, here is how you would fix it up to do that:
import csv
import os
from tempfile import NamedTemporaryFile
origfile = '...'
origdir = os.path.dirname(origfile)
# Open original file for read, and tempfile in same directory for write
with open(origfile, newline='') as inf, NamedTemporaryFile('w', dir=origdir, newline='') as outf:
old_results = csv.reader(inf)
new_results = csv.writer(outf)
for name, oldscore in old_results:
if name == student_full_name:
# Found our student, replace their score
new_results.writerow((name, correct))
# The write out the rest of the lines unchanged
new_results.writerows(old_results)
# and we're done
break
else:
new_results.writerow((name, oldscore))
else:
# else block on for loop executes if loop ran without break-ing
new_results.writerow((student_full_name, correct))
# If we got here, no exceptions, so let's keep the new data to replace the old
outf.delete = False
# Atomically replaces the original file with the temp file with updated data
os.replace(outf.name, origfile)
I am not too sure the best way to word this, but what I want to do, is read a pdf file, make various modifications, and save the modified pdf over the original file. As of now, I am able to save the modified pdf to a separate file, but I am looking to replace the original, not create a new file.
Here is my current code:
from pyPdf import PdfFileWriter, PdfFileReader
output = PdfFileWriter()
input = PdfFileReader(file('input.pdf', 'rb'))
blank = PdfFileReader(file('C:\\BLANK.pdf', 'rb'))
# Copy the input pdf to the output.
for page in range(int(input.getNumPages())):
output.addPage(input.getPage(page))
# Add a blank page if needed.
if (input.getNumPages() % 2 != 0):
output.addPage(blank.getPage(0))
# Write the output to pdf.
outputStream = file('input.pdf', 'wb')
output.write(outputStream)
outputStream.close()
If i change the outputStream to a different file name, it works fine, I just cant save over the input file because it is still being used. I have tried to .close() the stream, but it was giving me errors as well.
I have a feeling this has a fairly simple solution, I just haven't had any luck finding it.
Thanks!
You can always rename the temporary output file to the old file:
import os
f = open('input.pdf', 'rb')
# do stuff to temp.pdf
f.close()
os.rename('temp.pdf', 'input.pdf')
You said you tried to close() the stream but got errors? You could delete the PdfFileReader objects to ensure nobody still has access to the stream. And then close the stream.
from pyPdf import PdfFileWriter, PdfFileReader
inputStream = file('input.pdf', 'rb')
blankStream = file('C:\\BLANK.pdf', 'rb')
output = PdfFileWriter()
input = PdfFileReader(inputStream)
blank = PdfFileReader(blankStream)
...
del input # PdfFileReader won't mess with the stream anymore
inputStream.close()
del blank
blankStream.close()
# Write the output to pdf.
outputStream = file('input.pdf', 'wb')
output.write(outputStream)
outputStream.close()
If the PDFs are small enough (that'll depend on your platform), you could just read the whole thing in, close the file, modify the data, then write the whole thing back over the same file.