How to make a pdf parser in python from scratch [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I am looking to make a PDF parser from scratch using Python (or) any leads to tweaking existing libraries/algorithms.

Here you can find some nice tools for your need, like:
pdfrw: Read and write PDF files
slate : Active development. Simplifies extracting text from PDF files
PyPDF2 : Active development. Split, merge, crop, etc
PDFMiner : Active development. Extracting text, images, object coordinates, metadata from PDF file
And there is more in this link.

Related

tutorials on how to create a database in python? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 11 months ago.
Improve this question
I'm looking to start building a database with python so I can make more dynamic web pages and just as a project and I want it to just be stored as a file like a .db file but I can't find any intermediate friendly tutorials or tutorials that arent online cloud options.
A relatively powerful option for Python is sqlite3 from the stdlib.
You can find tutorials for this in places like YouTube and sqlitetutorial.net
And for better understanding of how the library was intended to be used, visit the official documentation on Python's website

Convert PDF to CSV or xlsx with python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed last year.
Improve this question
I'm trying to convert the whole extension of a PDF into a CSV or an xlsx with python and I've hit a wall.
I know that there is an API called PDFTables that works perfectly but the number of documents that I would like to convert (over 400) and the fact that its use involves an economic investment that I can't afford makes its use unfeasible. There is another library that I've tried, tabula, however as far as I know it only works with the tables of the PDF.
With this problem in mind, are there any other options available?
Thank you in advance.
If you don't need it to be programmatic, have you seen https://www.adobe.com/la/acrobat/online/pdf-to-excel.html?

modify libreoffice writer document from python for auto reporting [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I would like to generate automated reports from my python program. I was using pdf reportlab solution before, but my needs are evolving. I need the possibility to comment / apply modifications in the report.
As a result, I was imagining to create an empty LibreOffice writer document (but with logos, first page,
etc. like a template), and from my program, I planned to copy this document, feed it from my python prog data, txt and pictures, and save it with a new name.
Like this, the report is closed to be complete and I can adjust it by myself at the end.
Do you know if it is possible to do that ?
Thanks for your help!
One of the solution I found is to use python docx library. It can open and save docx like document. It should work with writer docs

Library to parse SVG in Ruby or Python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
SVG is a huge standard, which is based on XML. I have parsed SVG as XML in the past. However, some things are hard.
For example, I would like to know the size of a group. As far as I can tell, this is only possible by recursively stepping through all the children in the group (noting all their transformations) and accumulating their sizes.
I would love to have a library that could do stuff like that for me. Does something like this exist?
In python you have pysvg:
import pysvg.parser
svg = pysvg.parser.parse(<filename>)
print svg.get_width(), svg.get_height()

Good python library for generating audio files? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Can anyone recommend a good library for generating an audio file, such as mp3, wav, or even midi, from python?
I've seen recommendations for working with the id tags (song name, artist, etc) in mp3 files, but this is not my goal.
See http://wiki.python.org/moin/Audio/ and http://wiki.python.org/moin/PythonInMusic, maybe some of the projects listed there can be of help.
Also, Google is your friend.
I've never used it, but check out ounk.

Categories