Python and downloading Google Sheets feeds - python

I'm trying to download a spreadsheet from Google Drive inside a program I'm writing (so the data can be easily updated across all users), but I've run into a few problems:
First, and perhaps foolishly, I'm only wanting to use the basic python distribution, so I'm not requiring people to download multiple modules to run it. The urllib.request module seems to work well enough for basic downloading, specifically the urlopen() function, when I've tested it on normal webpages (more on why I say "normal" below).
Second, most questions and answers on here deal with retrieving a .csv from the spreadsheet. While this might work even better than trying to parse the feeds (and I have actually gotten it to work), using only the basic address means only the first sheet is downloaded, and I need to add a non-obvious gid to get the others. I want to have the program independent of the spreadsheet, so I only have to add new data online and the clients are automatically updated; trying to find a gid programmatically gives me trouble because:
Third, I can't actually get the feeds (interface described here) to be downloaded correctly. That does seem to be the best way to get what I want—download the overview of the entire spreadsheet, and from there obtain the addresses to each sheet—but if I try to send that through urlopen(feed).read() it just returns b''. While I'm not exactly sure what the problem is, I'd guess that the webpage is empty very briefly when it's first loaded, and that's what urlopen() thinks it should be returning. I've included what little code I'm using below, and was hoping someone had a way of working around this. Thanks!
import urllib.request as url
key = '1Eamsi8_3T_a0OfL926OdtJwLoWFrGjl1S2GiUAn75lU'
gid = '1193707515'
# Single sheet in CSV format
# feed = 'https://docs.google.com/spreadsheets/d/' + key + '/export?format=csv&gid=' + gid
# Document feed
feed = 'https://spreadsheets.google.com/feeds/worksheets/' + key + '/private/full'
csv = url.urlopen(feed).read()
(I don't actually mind publishing the key/gid, because I am planning on releasing this if I ever finish it.)

Requres OAuth2 or a password.
If you log out of google and try again with your browser, it fails (It failed when I did logged out). It looks like it requires a google account.
I did have it working with and application password a while ago. But I now use OAuth2. Both are quite a bit of messing about compared to CSV.

This sounds like a perfect use case for a wrapper library i once wrote. Let me know if you find it useful.

Related

Is there a way to automatically download data from a utility using python?

I'm trying to download files automatically from a utility, like PG&E, so I can view my statements easily since I have a couple of different accounts for one login. Currently, I'm using python to figure it out, but I'm stuck on how to actually write the code to download the files. As of right now, I can only access all of the statement information within Python, but cannot figure out how to download the actual bills itself. Does anyone have additional information or resources that you can share so I can try downloading bills using python?

Python-based PDF parser integrated with Zapier

I am working for a company which is currently storing PDF files into a remote drive and subsequently manually inserting values found within these files into an Excel document. I would like to automate the process using Zapier, and make the process scalable (we receive a large amount of PDF files). Would anyone know any applications useful and possibly free for converting PDFs into Excel docs and which integrate with Zapier? Alternatively, would it be possible to create a Python script in Zapier to access the information and store it into an Excel file?
This option came to mind. I'm using google drive as an example, you didn't say what you where using as storage, but Zapier should have an option for it.
Use cloud convert, doc parser (depends on what you want to pay, cloud convert at least gives you some free time per month, so that may be the closest you can get).
Create a zap with this step:
Trigger on new file in drive (Name: Convert new Google Drive files with CloudConvert)
Convert file with CloudConvert
Those are two options by Zapier that I can find. But you could also do it in python from your desktop by following something like this idea. Then set an event controller in windows event manager to trigger an upload/download.
Unfortunately it doesn't seem that you can import JS/Python libraries into zapier, however I may be wrong on that. If you could, or find a way to do so, then just use PDFminer and "Code by Zapier". A technician might have to confirm this though, I've never gotten libraries to work in zaps.
Hope that helps!

Is pandas a local-only library

I recently started coding, but took a brief stint. I started a new job and I’m under some confidential restrictions. I need to make sure python and pandas are secure before I do this—I’ll also be talking with IT on Monday
I was wondering if pandas in python was a local library, or does the data get sent to or from elsewhere? If I write something in pandas—will the data be stored somewhere under pandas?
The best example of what I’m doing is best found on a medium article about stripping data from tables that don’t have csv Exports.
https://medium.com/#ageitgey/quick-tip-the-easiest-way-to-grab-data-out-of-a-web-page-in-python-7153cecfca58
Creating a DataFrame out of a dict, doing vectorized operations on its rows, printing out slices of it, etc. are all completely local. I'm not sure why this matters. Is your IT department going to say, "Well, this looks fishy—but some random guy on the internet says it's safe, so forget our policies, we'll allow it"? But, for what it's worth, you have this random guy on the internet saying it's safe.
However, Pandas can be used to make network requests. Some of the IO functions can take a URL instead of a filename or file object. Some of them can also use another library that does so—e.g., if you have lxml installed, read_html, will pass the filename to lxml to open, and if that filename is an HTTP URL, lxml will go fetch it.
This is rarely a concern, but if you want to get paranoid, you could imagine ways in which it might be.
For example, let's say your program is parsing user-supplied CSV files and doing some data processing on them. That's safe; there's no network access at all.
Now you add a way for the user to specify CSV files by URL, and you pass them into read_csv and go fetch them. Still safe; there is network access, but it's transparent to the end user and obviously needed for the user's task; if this weren't appropriate, your company wouldn't have asked you to add this feature.
Now you add a way for CSV files to reference other CSV files: if column 1 is #path/to/other/file, you recursively read and parse path/to/other/file and embed it in place of the current row. Now, what happens if I can give one of your users a CSV file where, buried at line 69105, there's #http://example.com/evilendpoint?track=me (an endpoint which does something evil, but then returns something that looks like a perfectly valid thing to insert at line 69105 of that CSV)? Now you may be facilitating my hacking of your employees, without even realizing it.
Of course this is a more limited version of exactly the same functionality that's in every web browser with HTML pages. But maybe your IT department has gotten paranoid and clamped down security on browsers and written an application-level sniffer to detect suspicious followup requests from HTML, and haven't thought to do the same thing for references in CSV files.
I don't think that's a problem a sane IT department should worry about. If your company doesn't trust you to think about these issues, they shouldn't hire you and assign you to write software that involves scraping the web. But then not every IT department is sane about what they do and don't get paranoid about. ("Sure, we can forward this under-1024 port to your laptop for you… but you'd better not install a newer version of Firefox than 16.0…")

migrating data from tomcat .dbx files

I want to migrate data from an old Tomcat/Jetty website to a new one which runs on Python & Django. Ideally I would like to populate the new website by directly reading the data from the old database and storing them in the new one.
Problem is that the database I was given comes in the form of a bunch of WEB-INF/data/*.dbx and I didn't find any way to read them. So, I have a few questions.
Which format do the WEB-INF/data/*.dbx use?
Is there a python module for directly reading from the WEB-INF/data/*.dbx files?
Is there some external tool for dumpint the WEB-INF/data/*.dbx to an ascii format that will be parsable by python?
If someone has attempted a similar data migration, how does it compare against scraping the data from the old website? (assuming that all important data can be scraped)
Thanks!
The ".dbx" suffix has been used by various softwares over the years so it could be almost anything. The only way to know what you really have here is to browse the source code of the legacy java app (or the relevant doc or ask the author etc).
wrt/ scraping, it's probably going to be a lot of a pain for not much results, depending on the app.

want to add url links to .csv datafeed using python

ive looked through the current related questions but have not managed to find anything similar to my needs.
Im in the process of creating a affiliate store using zencart - now one of the issues is that zencart is not designed for redirects and affiliate stores but it can be done. I will be changing the store so it acts like a showcase store showing prices.
There is a mod called easy populate which allows me to upload datafeeds. This is all well and good however my affiliate link will not be in each product. I can do it manually after uploading the data feed and going to each product and then adding it as an image with a redirect link - However when there are over 500 items its going to be a long repetitive and time consuming job.
I have been told that I can add the links to the data feed before uploading it to zencart and this should be done using python. Ive been reading about python for several days now and feel im looking for the wrong things. I was wondering if someone could please advise the simplest way for me to get this done.
I hope the question makes sense
thanks
abs
You could craft a python script using csv module like this:
>>> import csv
>>> cartWriter = csv.writer(open('yourcart.csv', 'wb'))
>>> cartWriter.writerow(['Product', 'yourinfo', 'yourlink'])
You need to know how link should be formatted hoping that it could be composed using the other parameters present on csv file.
First, use the CSV module as systempuntoout told you, secondly, you will want to change your header to:
mimetype='text/csv'
Content-Disposition = 'attachment; filename=name_of_your_file.csv'
The way to do it depends very much of your website implementation. In pure Python you would probably do that with an HttpResponse object. In django, as well, but there are some shortcuts.
You can find a video demonstrating how to create CSV files with Python on showmedo. It's not free however.
Now, to provide a link to download the CSV, this depends of your Website. What is the technology behinds it : pure Python, Django, Pylons, Tubogear ?
If you can't answer the question, you should ask your boss a training about your infrastructure before trying to make change to it.

Categories