Python-based PDF parser integrated with Zapier - python

I am working for a company which is currently storing PDF files into a remote drive and subsequently manually inserting values found within these files into an Excel document. I would like to automate the process using Zapier, and make the process scalable (we receive a large amount of PDF files). Would anyone know any applications useful and possibly free for converting PDFs into Excel docs and which integrate with Zapier? Alternatively, would it be possible to create a Python script in Zapier to access the information and store it into an Excel file?

This option came to mind. I'm using google drive as an example, you didn't say what you where using as storage, but Zapier should have an option for it.
Use cloud convert, doc parser (depends on what you want to pay, cloud convert at least gives you some free time per month, so that may be the closest you can get).
Create a zap with this step:
Trigger on new file in drive (Name: Convert new Google Drive files with CloudConvert)
Convert file with CloudConvert
Those are two options by Zapier that I can find. But you could also do it in python from your desktop by following something like this idea. Then set an event controller in windows event manager to trigger an upload/download.
Unfortunately it doesn't seem that you can import JS/Python libraries into zapier, however I may be wrong on that. If you could, or find a way to do so, then just use PDFminer and "Code by Zapier". A technician might have to confirm this though, I've never gotten libraries to work in zaps.
Hope that helps!

Related

Is there a way to automatically download data from a utility using python?

I'm trying to download files automatically from a utility, like PG&E, so I can view my statements easily since I have a couple of different accounts for one login. Currently, I'm using python to figure it out, but I'm stuck on how to actually write the code to download the files. As of right now, I can only access all of the statement information within Python, but cannot figure out how to download the actual bills itself. Does anyone have additional information or resources that you can share so I can try downloading bills using python?

Automated OCR program or off-the-shelf tool to scan PDFs/images and output into excel documents

I'm new to programming and recently started getting into Python a lot more seriously. However, I've done some projects that required programming in my company, so I have some background on how it works (or how to scour the internet! lol).
Recently however, we have had a client that sends us invoices in PDF formats and we would like to automate all the invoices to compile into one .csv file.
I've been picking up a few OCR codes (I ran my first image-to-text output recently), however I don't think I'm 100% capable of creating such automation yet since I'm still very fresh in programming. It would require at least a few weeks, and I'm not sure if it's worth it if we could just ask the client to set up a more accurate excel spreadsheet to send over every time.
That's why I'm turning to an already available OCR tool. I recently found this gem: https://www.pdftoexcel.com/ however it is a very manual process and not as automated as we could like. If there is a way to program a script to upload an available PDF file from a certain folder in order to upload it to the website and export it into an Excel file every time we receive an invoice, would it be possible to share?
It would also be a big plus if there would be a way to upload a batch of invoices and identify different charges, providing a summary across the scanned invoices, particularly in the categories
I hope what I'm asking for makes sense. Let me know if you'd require more clarification.
Cheers
There's lots of stuff available for Python, if you have a quick search on Google or StackOverflow. I believe I used Tesseract OCR in the past.
My experience is that you will get serviceable OCR with some of the popular Python libraries, but the excellent stuff will come with a price tag.
Try some tests with your PDF invoices, but if you are getting even slightly questionable results, you may have to consider more expensive alternatives (or even standalone machinery!).
If the client is sending you clear, nicely formatted PDFs with a good font, I don't see why free Python libraries wouldn't be sufficient.

How can I set up an automated import to Google Data Prep?

When using Google Data Prep, I am able to create automated schedules to run jobs that update my BigQuery tables.
However, this seems pointless when considering that the data used in Prep is updated by manually dragging and dropping CSVs (or JSON, xlsx, whatever) into the data storage bucket.
I have attempted to search for a definitive way of updating this bucket automatically with files that are regularly updated on my PC, but there seems to be no best-practice solution that I can find.
How should one go about doing this efficiently and effectively?
So, in order to upload files from your computer to Google Cloud Storage, there are a few possibilities. If you just run an daemon process which handles any change in that shared directory, you can code an automatic upload in this different languages: C#, Go, Java, Node.JS, PHP, Python or Ruby.
You have here some code examples for uploading objects but, be aware that there is also a detailed Cloud Storage Client Libraries references and you can also find the GitHub links in "Additional Resources".

How can I get the data from LocalStorage of a Chrome extension using Python?

I am using TimeStats extension on Chrome. And what I want to do now is to read the data in the LocalStorage (which contains all the information about the time I spent on each website) in a Python script and do later data processing.
I know that Ctrl+c and Ctrl+v would work in this case, but I am wondering are there any elegent and reliable ways to do that?
Thanks!
You can use native messaging to send data between your extension and an external app. The sample app for demonstrating native messaging is written in Python, so you have the communications part already solved.
EDIT:
I see now that you are talking about an extension you don't own. Google Chrome currently stores LocalStorage data in SQLite format, so you should be able to read it directly using the sqlite3 package. See the answers to this question.
The file for the timeStats extension would be chrome-extension_ejifodhjoeeenihgfpjijjmpomaphmah_0.localstorage
Note that Google can change the way of storing LocalStorage at any time.

GAE Python how to check file type on upload

So, i'm trying to create an google app engine (python) app that allows people to share files. I have file uploads working well, but my concern is about checking the file extension and making sure, primarily, that the files are read only, and secondly, that they are of the filetype that is specified. These will not be image files, as a know they are a lot of image resources already. Specifically, .stl mesh files, but i'd like to be able to do this more generally.
I know there are modules that can do this, python-magic seems to be able to do this for example, but i can't seem to find any that i'm able to import without LoadModuleRestricted. I'm considering writing my own parser, but that would be a lot of work for such a common (i'm assuming) issue.
Anyway, i'm totally stumped so this is my first stackoverflow question, so hope i'm doing well etiquette wise. Let me know, and thanks!
It sounds like you want to read the first few bytes of the uploaded file to verify that its signature matches the purported mime type. Assuming that you're uploading to blobstore (i.e., via a url obtained from blobstore.get_upload_url(), then once you're redirected to the upload handler whose path you gave to get_upload_url, you can open blob using a BlobReader, then read and verify the signature.
The Blobstore sample app lays out the framework. You'd glue in code in UploadHandler once you have blob_info (using blob_info.key() to open the blob).

Categories