I'm looking for a good python module to generate pdf417 barcodes. Has anyone used one they liked?
Ideally I would like one with as few dependencies as possible, and one that runs on both linux and MacOSX.
We recently had to approach this problem as well, and being a Python shop we wanted a Python solution. It become clear the elaphe is the project that had the potential to actually accomplish pdf 417 barcode.
However what we found was it errors by todays standards, and so we entered the hunt to fix the library. Turns out elaphe must generate an outdated form of *.eps post script that can't be interpreted by ghost script and this is where the bar code generation fails.
Well fortunately elphae uses a common library behind the scenes called Barcode Writer in Pure PostScript # http://bwipp.terryburton.co.uk
This common backend library which has many projects in multi-languages using it to generate projects. The fix specifically for us was to fork elaphe, and correct it's *.eps file generation.
To determine what is broken in the *.eps, look at this other site that is made using postscriptbarcode, and it let's you generate the pdf417 barcode online (as well as other formats): http://www.terryburton.co.uk/barcodewriter/generator/
Once you generate a pdf417 barcode it gives you the option to download the .png, .jpg, and YES the .eps file!
Using this .eps file you can pipe it to ghost script and tweak the parameterization to get the exact pdf417 barcode you are looking for. Then take this result and integrate it into the elaphe library and actually get a pull request on that thing ....
Seems to be a bit of work, but nothing that can't be knocked out in an afternoon. It is ideal to get the elaphe library back in shape to generate these without making this enhancement.
Please note that the performance of this approach for us is a few seconds to generate this barcode due to the fact it creates the 2000 line eps file and pipes it to ghost script which generates another image file that we send back as the final barcode result. This is not as performance as code128 with reportlab.
Perhaps room for optimizations: Is pillow faster than PIL in anyway? Do we need all the parts of the eps file to generate the barcode of type pdf417? Other ways to optimize?
Anyway, great question Ken and I hope you find this to be a great answer.
I guess the issue in elaphe reported by Matteius in 2013 has been fixed, since the issues and commit logs show updates on the pdf417 topic since then.
Anyway, there are now a few other options (got the list with either pip search elaphe or pip search pdf417) :
elaphe ;
elaphe3 (fork of elaphe tested against python3) ;
candybar (no documentation ? also a webservice) ;
pdf417gen ;
treepoem (about the name : barcode -> bark ode -> tree poem =D ) — edit : didn't dig the issue, but as of today generation of PDF417 seems broken.
All but pdf417gen support several types of barcodes.
Note that the documentation of bwipp (on which are based elaphe and treepoem) only mentions 5 levels of error correction (1 to 5), while pdf417gen claims to support 9 security levels (0 to 8).
Reportlab does have an extension called rlbarcode, but this one does not include support for pdf417 codes. I do not know of any other extension for reportlab including support for pdf417 bar codes.
Anyway, if you are interested in generation of pdf417 codes from python, you may be interested in this project: elaphe.
I have still not tested it (in fact, I need to generate pdf417 from python, and I found this thread as well as the elaphe project page) I am going to download the elaphe tools in order to test it right now.
Related
Is it possible to convert Dwg to pdf file using Python?
I have multiple Dwg files that need to convert to pdf.
I am experimenting to create an app to convert but can not find any solution?
Please, if you have any idea help.
Vector Express is a free API you may able to use. (requires a network connection, though)
https://github.com/smidyo/vectorexpress-api
Of course it is possible to write a code in Python to publish DWG file as PDF.
However, DWG is a proprietary file format from AutoCAD, therefore there are a lot of legal issues to distribute the code (beyond your private usage), unless you get a written consent from AutoCAD...
You'll find several application that would already do the job.
Maybe check Open Design Alliance as a starting point.
Using QCAD Command Line Tools (Open Source)
According to qcad.org "QCAD is a free, open source application for computer aided drafting (CAD) in two dimensions (2D). With QCAD you can create technical drawings such as plans for buildings, interiors, mechanical parts or schematics and diagrams. QCAD works on Windows, macOS and Linux. The source code of QCAD is released under the GPL version 3 (GPLv3), a popular Open Source license."
Supported Command Line Conversions
dwginfo
dwg2csv
dwg2bmp
dwg2svg
dwg2pdf
dwgmapconvert
dwg2maptiles
dwg2dwg / pdf2dwg
dwghatch
dwghatcharea
dwgexplode
merge
bbox
dwgnest (QCAD/CAM)
The above commands are very self-explanatory for further information please refer to the above embedded url at qcad.org
Usage in Python
Though I believe that this might not be best practice but a os.system command as follows, might be enough for most use cases
I'm new to programming and recently started getting into Python a lot more seriously. However, I've done some projects that required programming in my company, so I have some background on how it works (or how to scour the internet! lol).
Recently however, we have had a client that sends us invoices in PDF formats and we would like to automate all the invoices to compile into one .csv file.
I've been picking up a few OCR codes (I ran my first image-to-text output recently), however I don't think I'm 100% capable of creating such automation yet since I'm still very fresh in programming. It would require at least a few weeks, and I'm not sure if it's worth it if we could just ask the client to set up a more accurate excel spreadsheet to send over every time.
That's why I'm turning to an already available OCR tool. I recently found this gem: https://www.pdftoexcel.com/ however it is a very manual process and not as automated as we could like. If there is a way to program a script to upload an available PDF file from a certain folder in order to upload it to the website and export it into an Excel file every time we receive an invoice, would it be possible to share?
It would also be a big plus if there would be a way to upload a batch of invoices and identify different charges, providing a summary across the scanned invoices, particularly in the categories
I hope what I'm asking for makes sense. Let me know if you'd require more clarification.
Cheers
There's lots of stuff available for Python, if you have a quick search on Google or StackOverflow. I believe I used Tesseract OCR in the past.
My experience is that you will get serviceable OCR with some of the popular Python libraries, but the excellent stuff will come with a price tag.
Try some tests with your PDF invoices, but if you are getting even slightly questionable results, you may have to consider more expensive alternatives (or even standalone machinery!).
If the client is sending you clear, nicely formatted PDFs with a good font, I don't see why free Python libraries wouldn't be sufficient.
I am using Xyce which is a circuit simulator. I am using it to export a .CSV file and a .prn file. I found Xycegnuplot.py "https://github.com/OpenXyce/Xyce/blob/master/utils/gnuplotXyce.py". I am trying to use it to plot my output variables from Xyce, howver, every time I run gnuplotXyce.py as mentioned by its author I get an error " Import Error" at the "from finblock import findblock" line and I don't know what is that error.
Please help.
Thanks
If you are going to use Xyce, you should probably get the official version from Sandia National Laboratories instead of from the OpenXyce site on github. This version was forked by an anonymous github user, and has not been updated since last fall. Since that update, Sandia released Xyce 6.2 and the OpenXyce creator did not import the new release.
You should also probably join the xyce-users group on googlegroups, where the Xyce developers monitor all questions and try to answer them promptly. It is only by happenstance that I found your question here on stackoverflow.
The "gnuplotXyce.py" script is not really maintained, and might not have been kept working with all the changes that have been made to Xyce since its release. That said, the python script depends on a number of python modules including gnuplot-py which should be available from http://gnuplot-py.sourceforge.net. The "findblock.py" module that you say cannot be found is also present in the "utils" directory of the Xyce source code, alongside gnuplotXyce.py. If you have the whole utils directory downloaded, this error should go away.
I just tried gnuplotXyce.py on a simple netlist with csv output and it didn't work, so my assumption is that the script was not maintained and will need to be fixed.
The script does sort of work if you use the native Xyce standard (.prn) format (i.e. don't specify "format=csv" on your .print line). Unfortunately, it does not leave the window open after it finishes plotting, so it is rather useless. If you use the "--ps" option, though, a correct postscript file will be created that can be viewed in any postscript viewer, or printed on a postscript printer (or through a properly set-up Linux CUPS printer that understands postscript).
The CSV format in Xyce was primarily created in order to allow import into spreadsheets such as Excel or OpenOffice-scalc, which programs have their own plotting utilities.
The ".prn" standard format works well in gnuplot. There is an example of how to use gnuplot to do this display in the document "Using Open Source Schematic Capture Tools With Xyce" on the Sandia Labs Xyce web site (in the documentation and tutorials section).
The official Xyce web site is http://xyce.sandia.gov/
First of all, I agree that this might sound like a question which has already been asked many times in the past. However I couldn't find any answer that was relevant to me in the similar questions so I'll try to be more specific.
I would need to transform PPTX/DOCX files into PDF using Python but I don't have any experience in file format conversion. I have been looking in many places/forums/websites, read a lot of documentation and came across some useful libraries (python-pptx and pyPdf mainly), but I still don't know where to start.
When looking on the Internet, I can see many websites that offer file format conversions as a paying service, even with advanced API's: submit a file via POST and get the transformed PDF file in return. This could work for me, but I am really interested in writing myself the code that does the conversion work from OOXML to PDF.
How would you start doing this? Or is it just impossible on my own?
Thanks for your help!
After some research and with the help of python-pptx's creator, I was able to write to the PowerPoint COM interface using a Virtual Machine.
In case someone reads this thread, this is how I managed to get this done:
- Setup a VM with Microsoft Windows/Office installed on it ;
- Install Python, Django and win32com libraries on the VM.
The files are sent locally from the original Django project to the virtual machine (which are on the same network) through a simple POST request. The file is converted on the VM using win32com.client (which is just a simple call to the win32com.client library) and then sent back as a response to the original Django view, which in turn processes the response.
Note: it took me some time to realize I needed to use the #csrf_exempt decorator for this setup to work.
On an Ubuntu server, I want to create pdfs which include other static pdfs. I have tried using ReportLab with pyPdf. Ideally I would use ReportLab to do the whole thing, but in order to import the pdfs requires their PageCatcher which has a large recurring fee.
So I use pyPdf to merge a page created with ReportLab and my other pdfs. The problem is that even though this looks fine in Acrobat and Foxit, part of one of the pages prints garbled on a Xerox 7400 color printer. I can't figure out the issue, but would be willing to buy a more integrated solution if it existed and was reasonably priced. I thought PDF Creator Pilot was it until I saw that it was Windows only.
So is there a reasonably priced ($1K or less) solution or a different suggestion?
I have had a lot of success with the Java library iText. They have a great library of samples for pretty much anything you could think of doing with PDF files. This example is for concatenating PDF files and sounds like it would do what you need: http://itextpdf.com/examples/index.php?page=example&id=123. There is also PDFBox which is another great Java based PDF manipulation library.
I realize that you are looking for a Python based solution but there may not be many other options. If you are using the Jython interpreter instead of CPython, integrating in iText should be trivial. If not, then you could consider calling out to it as a separate process. I realize that may not be idea for your situation but I figured I would mention it as an option.
Another non-Python answer. If you are just merging pages, then pdftk does that well (along with a lot of other things).