Downloading CSV file from website/server with Python 3.X - python

Programming beginner here. So for my very first project I was able to make a quick python script that downloaded files from this website:
http://www.wesm.ph/inner.php/downloads/market_prices_&_schedules
I noticed that the link address of the to-be-downloaded file
followed a pattern.
(http://wesm.ph/admin/downloads/download.php?download=../csv/mpas/XXXXX/XXXX.csv)
With some string concatenation and using the datetime module, I was able to create the HTML string of the csv file. After which, I just would use the:
urllib.request.urlopen(HTMLlink).read()
and save it with something like:
with open('output.csv', "w", newline='') as f:
writer = csv.writer(f)
writer.writerows(fullList)
It used to work - now it doesn't. I noticed however whenever I clicked the 'Generate Report' button and THEN run the script, the script would generate the output file. I'm not sure why this works. Is there a way to send a request to their server to generate the actual file? Which module, or commands should I use?

Most likely those files are only temporarily stored on that webserver after you click 'Generate Report'.
In order to grenerate new ones, there might even be a check (in JavaScript or using Cookies, Session-ID), to see if the generation of the new link/file is asked from a human, or a bot.
You might also want to check the HTTP return code (or even the full returned headers to see what exactly the server is answering).

Related

How can I update data in a JSON file while using UptimeRobot?

I am creating an Economy Discord Bot using Python and I am hosting it on Replit and keeping it online using UptimeRobot. Sometimes, when people use my bot's economy commands, the data is not updated in the JSON file. I have observed that this only happens when my UptimeRobot monitor brings my bot online and not when I manually run the code. Does anyone know how to work around this?
Here is the code I am using to update the JSON file:
with open("data.json", "w") as file:
file.write(json.dumps(data))
The issue here might be with replit. Replit reboots your repl every once in a while, even if you have hacker plan or are using uptime robot. And sometimes the JSON file might not be saved. In this case the file reverts to its last saved state. As far as I'm aware, there is no way to work around this. The only way is using an external database like MongoDB.
I would dump your json a little differently.. PS. I have never seen this kind of thing happen so it might just be the way your json dump code is.
with open('data.json', 'w') as f:
json.dump(data, f, indent=4)
So we just open the json file data.json or whatever your json file is called. We define it as f and we dump your data or whatever you called it into f... the indent=4 just makes it more clean but you can get rid of it if you want.

Internet Shortcut in python

I have a problem. Let's say I have a website (e.g. www.google.com). Is there any way to create a file with a .url extension linking to this website in python? (I am currently looking for a flat, and I am trying to save shortcuts on my hard drive only to apartment offers posted online matching my expectations ) I've tried to use the os and requests module to create such files, but with no success. I would really appreciate the help. (I am using python 3.9.6 on Windows 10)
This is pretty straightforward. I had no idea what .URL files were before seeing this post, so I decided to drag its URL to my desktop. It created a file with the following contents which I viewed in Notepad:
[InternetShortcut]
URL=https://stackoverflow.com/questions/68304057/internet-shortcut-in-python
So, you just need to write out the same thing via Python, except replace the URL with the one you want:
test_url = r'https://www.google.com/'
with open('Google.url','w') as f:
f.write(f"""[InternetShortcut]
URL={test_url}
""")
With regards to your current attempts:
I've tried to use os and requests module to create such file
It's not clear what you're using requests or os for, since you didn't provide a Minimal Reproduceable Example of what you'd tried so far; so, if there's a more complex element to this that you didn't specify, such as automatically generating the file while you're in your browser, or something like that, then you need to update your question to include all of your requirements.

Python discord bot has problems writing to textfiles

I'm currently working on a little discord bot. To host it for free, I'm using an application on heroku.com which is connected to my github. Everytime I restart the bot it gets some previously stored information from a textfile (works perfectly).
f = open("example_textfile.txt", "r")
example_list = dict(json.loads(f.read()))
f.close()
Everytime a list gets updated it should overwrite the textfile with the updated list like this (does NOT work):
f = open("example_textfile.txt", "w")
f.write(json.dumps(example_list))
f.close()
If I host the bot locally on my PC everything works perfectly (then I need the path, not just the name of the file). But when I host it with Heroku it can only read the files but not overwrite them. Does anyone know why this doesn't work? Or is there any alternative? Would be great if you could help me :D (And sorry for my bad english xD. I'm not native)
This should work
json.dump(example_list, open("example_file.txt", "w"))
The reason the write method may not be working for you is because
json.dumps() automatically writes to the file; That is the purpose of the method.
You're writing to the program is indicating that whatever json.dumps returns is what will get written to the file...
You should use json.dump | Is writes to the file intaking a dictionary instead!

Python requests, how to avoid DDOSing someone

I am writing a plugin for a software I am using. The goal here is to have a button, that will automate downloading data from a government site.
I am still a bit new to python, however I have managed to get it working exactly like I want. But - I want to avoid a case where my plugin makes hundreds of requests for downloads at once, as that could impact the website performance. The below function is what I use to download the files.
How can I make sure that what I am doing will not request 1000s of files within few seconds, thus overloading the website? Is there a way to make the below function wait for completing one download, before starting another?
import requests
from os.path import join
def downloadFiles(fileList, outDir):
# Download list of files, one by one
for url in fileList:
file = requests.get(url)
fileName = url.split('/')[-1]
open(join(outDir, fileName), 'wb').write(file.content)
This code is already sequential and it will wait for a download to finish before starting a new one. It's funny, usually people ask how to parallelize stuff.
If you want to slow it down further, you can add a time.sleep() to your code.
If you want to be more fancy you can use something like this

Python and downloading Google Sheets feeds

I'm trying to download a spreadsheet from Google Drive inside a program I'm writing (so the data can be easily updated across all users), but I've run into a few problems:
First, and perhaps foolishly, I'm only wanting to use the basic python distribution, so I'm not requiring people to download multiple modules to run it. The urllib.request module seems to work well enough for basic downloading, specifically the urlopen() function, when I've tested it on normal webpages (more on why I say "normal" below).
Second, most questions and answers on here deal with retrieving a .csv from the spreadsheet. While this might work even better than trying to parse the feeds (and I have actually gotten it to work), using only the basic address means only the first sheet is downloaded, and I need to add a non-obvious gid to get the others. I want to have the program independent of the spreadsheet, so I only have to add new data online and the clients are automatically updated; trying to find a gid programmatically gives me trouble because:
Third, I can't actually get the feeds (interface described here) to be downloaded correctly. That does seem to be the best way to get what I want—download the overview of the entire spreadsheet, and from there obtain the addresses to each sheet—but if I try to send that through urlopen(feed).read() it just returns b''. While I'm not exactly sure what the problem is, I'd guess that the webpage is empty very briefly when it's first loaded, and that's what urlopen() thinks it should be returning. I've included what little code I'm using below, and was hoping someone had a way of working around this. Thanks!
import urllib.request as url
key = '1Eamsi8_3T_a0OfL926OdtJwLoWFrGjl1S2GiUAn75lU'
gid = '1193707515'
# Single sheet in CSV format
# feed = 'https://docs.google.com/spreadsheets/d/' + key + '/export?format=csv&gid=' + gid
# Document feed
feed = 'https://spreadsheets.google.com/feeds/worksheets/' + key + '/private/full'
csv = url.urlopen(feed).read()
(I don't actually mind publishing the key/gid, because I am planning on releasing this if I ever finish it.)
Requres OAuth2 or a password.
If you log out of google and try again with your browser, it fails (It failed when I did logged out). It looks like it requires a google account.
I did have it working with and application password a while ago. But I now use OAuth2. Both are quite a bit of messing about compared to CSV.
This sounds like a perfect use case for a wrapper library i once wrote. Let me know if you find it useful.

Categories