instantiate python class from class available as string , only in memory! - python

I'm using Reportlab to create PDFs. I'm creating two PDFs which I want to merge after I created them. Reportlab provides a way to save a pycanvas (source) (which is basically my pdf file in memory) as a python file, and calling the method doIt(filename) on that python file, will recreate the pdf file. This is great, since you can combine two PDFs on source code basis and create one merge pdf.
This is done like this:
from reportlab.pdfgen import canvas, pycanvas
#create your canvas
p = pycanvas.Canvas(buffer,pagesize=PAGESIZE)
#...instantiate your pdf...
# after that, close the PDF object cleanly.
p.showPage()
p.save()
#now create the string equivalent of your canvas
source_code_equiv = str(p)
source_code_equiv2 = str(p)
#merge the two files on str. basis
#not shown how it is exactly done, to make it more easy to read the source
#actually one just have to take the middle part of source_code_equiv2 and add it into source_code_equiv
final_pdf = source_code_equiv_part1 + source_code_equiv2_center_part + source_code_equiv_part2
#write the source-code equivalent of the pdf
open("n2.py","w").write(final_pdf)
from myproject import n2
p = n2.doIt(buffer)
# Get the value of the StringIO buffer and write it to the response.
pdf = buffer.getvalue()
buffer.close()
response.write(pdf)
return response
This works fine, but I want to skip the step that I save the n2.py to the disk. Thus I'm looking for a way to instantiate from the final_pdf string the corresponding python class and use it directly in the source. Is this possible?
It should work somehow like this..
n2 = instantiate_python_class_from_source(final_pdf)
p = n2.doIt(buffer)
The reason for this is mainly that there is not really a need to save the source to the disk, and secondly that it is absolutely not thread save. I could name the created file at run time, but then I do not know what to import!? If there is no way to prevent the file saving, is there a way to define the import based on the name of the file, which is defined at runtime!?
One might ask why I do not create one pdf in advance, but this is not possible, since they are coming from different applications.

This seems like a really long way around to what you want. Doesn't Reportlab have a Canvas class from which you can pull the PDF document? I don't see why generated Python source code should be involved here.
But if for some reason it is necessary, then you can use StringIO to "write" the source to a string, then exec to execute it:
from cStringIO import StringIO
source_code = StringIO()
source_code.write(final_pdf)
exec(source_code)
p = doIt(buffer)

Ok, I guess you could use code module which provides standard interpreter’s interactive mode. The following would execute function doIt.
import code
import string
coded_data = """
def doIt():
print "XXXXX"
"""
script = coded_data + "\ndoIt()\n"
co = code.compile_command(script, "<stdin>", "exec")
if co:
exec co
Let me know, if this helped.

Related

Writing a Python pdfrw PdfReader object to an array of bytes / filestream

I'm currently working on a simple proof of concept for a pdf-editor application. The example is supposed to be a simplified python script showcasing how we could use the pdfrw library to edit PDF files with forms in them.
So, here's the issue. I'm not interested in writing the edited PDF to a file.
The idea is that file opening and closing is going to most likely be handled by external code and so I want all the edits in my files to be done in memory. I don't want to write the edited filestream to a local file.
Let me specify what I mean by this. I currently have a piece of code like this:
class FormFiller:
def __fill_pdf__(input_pdf_filestream : bytes, data_dict : dict):
template_pdf : pdfrw.PdfReader = pdfrw.PdfReader(input_pdf_filestream)
# <some editing magic here>
return template_pdf
def fillForm(self,mapper : FieldMapper):
value_mapping : dict = mapper.getValues()
filled_pdf : pdfrw.PdfReader = self.__fill_pdf__(self.filesteam, value_mapping)
#<this point is crucial>
def __init__(self, filestream : bytes):
self.filesteam : bytes = filestream
So, as you see the FormFiller constructor receives an array of bytes. In fact, it's an io.BytesIO object. The template_pdf variable uses a PdfReader object from the pdfrw library. Now, when we get to the #<this point is crucial> marker, I have a filled_pdf variable which is a PdfReader object. I would like to convert it to a filestream (a bytes array, or an io.BytesIO object if you will), and return it in that form. I don't want to write it to a file. However, the writer class provided by pdfrw (pdfrw.PdfWriter) does not allow for such an operation. It only provides a write(<filename>) method, which saves the PdfReader object to a pdf output file.
How should I approach this? Do you recommend a workaround? Or perhaps I should use a completely different library to accomplish this?
Please help :-(
To save your altered PDF to memory in an object that can be passed around (instead of writing to a file), simply create an empty instance of io.BytesIO:
from io import BytesIO
new_bytes_object = BytesIO()
Then, use pdfrw's PdfWriter.write() method to write your data to the empty BytesIO object:
pdfrw.PdfWriter.write(new_bytes_object, filled_pdf)
# I'm not sure about the syntax, I haven't used this lib before
This works because io.BytesIO objects act like a file object, also known as a file-like object. It and related classes like io.StringIO behave like files in memory, such as the object f created with the built-in function open below:
with open("output.txt", "a") as f:
f.write(some_data)
Before you attempt to read from new_bytes_object, don't forget to seek(0) back to the beginning, or rewind it. Otherwise, the object seems empty.
new_bytes_object.seek(0)

Python 3.7 - Save and import a file holding Python variables

This seems like a pretty obvious/dumb question, but there are a few specifications that make this a bit harder.
Let's say I have a program that takes 3 numbers from a user and does mathematical processes to them to get outputs. Then I open("file", "r") to write those variables to a file.
Then, let's say another program then imports them and uses them for other processes. I need to be able to import that file as Python code. To be clear: I am not saving text, I am saving python code to a file that is not a .py file.
Is there any way to save and import Python code to and from a non-.py file? And how?
EDIT: In the file I'm saving and importing, I'm also saving Python functions. I cannot simply save the variables themselves; I need the variable names, values, and python functions to be saved as normal text in a file, but when I import the file, it should be parsed as Python code.
Probably not a good idea to store computation result as code & then import it from elsewhere. You should use a proper data format to store the results - and import it as data. Use JSON or pickle etc.
However, if you do want to shoot yourself in the foot, Python gives you the tools to do that:
Let's say i have some code in a file temp.txt
number3=30
def f():
return 'method'
Then you can do this:
with open('temp.txt') as f:
code = f.read()
exec(code)
print(number3)
print(f())
Which outputs:
30
method
If i got this right, this might be done via eval function e.g. you save all code to be executed into a string and then save into a file.
When you need that executed read the file, tke the string and eval it
I must say however that using eval is a bad (very bad) practice and i would advice against it unless there is no other solution that you can find

The extractText() fucntion does not return text

pdfFileObject = open('MDD.pdf', 'rb')
pdfReader = PyPDF2.PdfFileReader(pdfFileObject)
count = pdfReader.numPages
for i in range(count):
page = pdfReader.getPage(i)
print(page.extractText()
Above is my code and when i run the script it just outputs a bunch of numbers and numerical(s) and not the text of the file. Could anyone help me with that?
This function doesn't work for all PDF files. This is explained in documentation:
This works well for some PDF
files, but poorly for others, depending on the generator used. This will
be refined in the future. Do not rely on the order of text coming out of
this function, as it will change if this function is made more
sophisticated.
:return: a unicode string object.
Try your code on this file. I'm sure it should work, so it seems that the problem is not in your code.
If you really need to parse files that are created the same way as your original MDD.pdf you have to choose another library.

How to encapsulate method to print console out to Word

I want to print my summary stats sometimes to console, and also other times to Word.
I don't want my code to be littered with lines calling to Word, because then I'd need to find and comment out like 100 lines each time I just wanted the console output.
I've thought about using a flag variable up at the front and changing it to false when I wanted to print versus not, but that's also a hassle.
The best solution I came up with was to write a separate script that opens a document, writes by calling my first summary stats script, and then closes the document:
import sys
import RunSummaryStats
from docx import Document
filename = "demo.docx"
document = Document()
document.save(filename)
f = open(filename, 'w')
sys.stdout = f
# actually call my summary stats script here: Call RunSummaryStats etc.
print("5")
f.close()
However, when I tried doing the above with python docx, upon opening my docs file I received the error We're sorry, we can't open this document because some parts are missing or invalid. As you can see the code above just printed out one number so it can't be a problem with the data I'm trying to write.
Finally, it needs to go to Word and not other file formats, to format some data tables.
By the way, this is an excerpt of RunSummaryStats. You can see how it's already filled with print lines which are helpful when I'm still exploring the data, and which I don't want to get rid of/replace with adding into a list:
The easy thing is to let cStringIO do the work, and separate collecting all your data from writing it into a file. That is:
import RunSummaryStats
import sys
# first, collect all your data into a StringIO object
orig_stdout = sys.stdout
stat_buffer = cStringIO.StringIO()
sys.stdout = stat_buffer
try:
# actually call my summary stats script here: Call RunSummaryStats etc.
print("5")
finally:
sys.stdout = orig_stdout
# then, write the StringIO object's contents to a Document.
from docx import Document
filename = "demo.docx"
document = Document()
document.write(add_paragraph(stat_buffer.getvalue()))
document.save(filename)
The Document() constructor essentially creates the .docx file package (this is actually a .zip archive of lots of XML and other stuff, which later the Word Application parses and renders etc.).
This statement f = open(filename, 'w') opens that file object (NB: this does not open Word Application, nor does it open a Word Document instance) and then you dump your stdout into that object. That is 100% of the time going to result in a corrupted Word Document; because you simply cannot write to a word document that way. You're basically creating a plain text file with a docx extension, but none of the underlying "guts" that make a docx a docx. As a result, Word Application doesn't know what to do with it.
Modify your code so that this "summary" procedure returns an iterable (the items in this iterable will be whatever you want to put in the Word Document). Then you can use something like the add_paragraph method to add each item to the Word Document.
def get_summary_stats(console=False):
"""
if console==True, results will be printed to console
returns a list of string for use elsewhere
"""
# hardcoded, but presume you will actually *get* these information somehow, modify as needed:
stats = ["some statistics about something", "more stuff about things"]
if console:
for s in stats:
print(s)
return stats
Then:
filename = "demo.docx"
document = Document()
# True will cause this function to print to console
for stat in get_summary_stats(True):
document.add_paragraph(stat)
document.save(filename)
So maybe there was a better way to do it, but in the end I
created a single function out of my summary stats script def run_summary
created a function based on #Charles Duffy's answer def print_word where StringIO reads from RunSummaryStats.run_summary(filepath, filename)
called def_print_word in my final module. There I set the variables for path, filename, and raw data source like so:
PrintScriptToWord.print_word(ATSpath, RSBSfilename, curr_file + ".docx")
I welcome any suggestions to improve this or other approaches.

Python Image Uploading with AjaxUpload

I'm trying to use AjaxUpload with Python:
http://valums.com/ajax-upload/
I would like to know how to access the uploaded file with Python. On the web site, it says:
* PHP: $_FILES['userfile']
* Rails: params[:userfile]
What is the Syntax for Python?
request.params['userfile'] doesn't seem to work.
Thanks in advance! Here is my current code (using PIL imported as Image)
im = Image.open(request.params['myFile'].file)
import cgi
#This will give you the data of the file,
# but won't give you the filename, unfortunately.
# For that you have to do some other trick.
file_data = cgi.FieldStorage.getfirst('file')
#<IGNORE if you're not using mod_python>
#(If you're using mod_python you can also get the Request object
# by passing 'req' to the relevant function in 'index.py', like "def func(req):"
# Then you access it with req.form.getfirst('file') instead. NOTE that the
# first method will work even when using mod_python, but the first FieldStorage
# object called is the only one with relevant data, so if you pass 'req' to the
# function you have to use the method that uses 'req'.)
#</IGNORE>
#Then you can write it to a file like so...
file = open('example_filename.wtvr','w')#'w' is for 'write'
file.write(file_data)
file.close()
#Then access it like so...
file = open('example_filename.wtvr','r')#'r' is for 'read'
#And use file.read() or whatever else to do what you want.
I'm working with Pyramid, and I was trying to do the same thing. After some time I came up with this solution.
from cStringIO import StringIO
from cgi import FieldStorage
fs = FieldStorage(fp=request['wsgi.input'], environ=request)
f = StringIO(fs.value)
im = Image.open(f)
I'm not sure if it's the "right" one, but it seems to work.
in django, you can use:
request.FILES['file']
instead of:
request.POST['file']
i did not know how to do in pylons...maybe it is the same concept..

Categories