I have a bibtex file that I get from the frontend and I'm trying to parse this file with biblib (a python library to parse bibtex files). Because I get the file from the frontend its not stored in a file on my computer. The file gets passed through a variable from the frontend to python and is then stored in the python variable fileFromFrontend. So I can use for example:
bibtexFile = fileFromFrontend.read()
to read the file.
now I'm trying to do something like the following to print the parsed file in the python terminal:
from pybtex.database.input import bibtex
parser = bibtex.Parser()
bibtexFile= parser.parse_file(fileFromFrontend)
print (bibtexFile.entries)
but then I get this error:
-->bibtexFile = parser.parse_file(filesFromFrontend)
-->with open_file(filename, encoding=self.encoding) as f:
-->AttributeError: __enter__
This is probably because the parser tries to open the file but he doesn't have to open this file, he just needs to read this file. I don't know what function of the biblib library to use for parsing the file from a variable and haven't found anything so far to solve my problem.
Hopefully somebody can help
thanks
According to documentation ( https://docs.pybtex.org/api/parsing.html ) there is methods
parse_string and parse_bytes which could work.
so like this
from pybtex.database.input import bibtex
parser = bibtex.Parser()
bibtexFile= parser.parse_bytes(fileFromFrontend.read())
print (bibtexFile.entries)
I don't have pybtex installed, so I couldn't try it myself. But try those methods. Parse_bytes and parse_string needs bib-format as second parameter. In examples that is bibtex, so I tried it here.
Is there a way of converting stl (STereoLithography) files to ply (Polygon File Format) format in Python? I have to use another program that only accepts ply format.
Here is the link to the program that I want to use: http://www.cs.jhu.edu/~misha/Code/ShapeSPH/ShapeDescriptor/
Any other suggestions regarding how I could use this program?
My second question, can I use this .exe file by using Python? Should it be something like :
os.popen("file.exe --in value", shell=True)
os.system("file.exe --in value")
Am I right?
I am trying to generate a PDF using FOP. To do this I am taking in a template file, initialling its values with Jinja2 and then passing it through to fop with a system call.
Is it possible to do a subprocess call to FOP without passing through an input file but instead a string containing the XML directly? And if so how would I go about doing so?
I was hoping for something like this
fop -fo "XML here" -pdf output.pdf
Yes actually it was possible.
Using python I was able to import the xml from the file into lxml.etree:
tree = etree.parse('FOP_PARENT.fo.xml')
And then by using the etree to parse the include tags:
tree.xinclude()
Then it was a simple case of converting the xml back into unicode:
xml = etree.tounicode(tree)
This is how I got the templates to work. Hopefully this helps someone who has the same issue!
How to get Python source code representation of in-memory Python dictionary?
I decided to ask this question after reading Thomas Kluyver's comment on Rob Galanakis' blog post titled Why bother with python and config files? In his comment Thomas states
But if you want any way to change settings inside the application
(like a preferences dialog), there’s no good way to automatically
write a correct Python file.
Assuming it uses only "basic" Python types, you can write out the repr() of the structure, and then use ast.literal_eval() to read it back in after.
As the article says, you're better off using JSON/YAML or other formats, but if you seriously wanted to use a Python dict and are only using basic Python types...
Writing out (attempt to use pformat to try and make it more human readable):
from pprint import pformat # instead of using repr()
d = dict(enumerate('abcdefghijklmnopqrstuvwxyz'))
open('somefile.py').write(pformat(d))
Reading back:
from ast import literal_eval
d = literal_eval(open('somefile.py').read())
I'm trying to use pyPdf to extract and print pages from a multipage PDF. Problem is, text is not extracted from some pages. I've put an example file here:
http://www.4shared.com/document/kmJF67E4/forms.html
If you run the following, the first 81 pages return no text, while the final 11 extract properly. Can anyone help?
from pyPdf import PdfFileReader
input = PdfFileReader(file("forms.pdf", "rb"))
for page in input1.pages:
print page.extractText()
Note that extractText() still has problems extracting the text properly. From the documentation for extractText():
This works well for some PDF files,
but poorly for others, depending on
the generator used. This will be
refined in the future. Do not rely on
the order of text coming out of this
function, as it will change if this
function is made more sophisticated.
Since it is the text you want, you can use the Linux command pdftotext.
To invoke that using Python, you can do this:
>>> import subprocess
>>> subprocess.call(['pdftotext', 'forms.pdf', 'output'])
The text is extracted from forms.pdf and saved to output.
This works in the case of your PDF file and extracts the text you want.
This isn't really an answer, but the problem with pyPdf is this: it doesn't yet support CMaps. PDF allows fonts to use CMaps to map character IDs (bytes in the PDF) to Unicode character codes. When you have a PDF that contains non-ASCII characters, there's probably a CMap in use, and even sometimes when there's no non-ASCII characters. When pyPdf encounters strings that are not in standard Unicode encoding, it just sees a bunch of byte code; it can't convert those bytes to Unicode, so it just gives you empty strings. I actually had this same problem and I'm working on the source code at the moment. It's time consuming, but I hope to send a patch to the maintainer some time around mid-2011.
You could also try the pdfminer library (also in python), and see if it's better at extracting the text. For splitting however, you will have to stick with pyPdf as pdfminer doesn't support that.
I find it sometimes useful to convert it to ps (try with pdf2psand pdftops for potential differences) then back to pdf (ps2pdf). Then try your original script again.
I had similar problem with some pdfs and for windows, this is working excellent for me:
1.- Download Xpdf tools for windows
2.- copy pdftotext.exe from xpdf-tools-win-4.00\bin32 to C:\Windows\System32 and also to C:\Windows\SysWOW64
3.- use subprocess to run command from console:
import subprocess
try:
extInfo = subprocess.check_output('pdftotext.exe '+filePath + ' -',shell=True,stderr=subprocess.STDOUT).strip()
except Exception as e:
print (e)
I'm starting to think I should adopt a messy two-part solution. there are two sections to the PDF, pp 1-82 which have text page labels (pdftotext can extract), and pp 83-end which have no page labels but pyPDF can extract and it explicitly knows pages.
I think I need to combine the two. Clunky, but I don't see any way round it. Sadly I'm having to do this on a Windows machine.