Downloading a file from a html? url with python 3 - python

I've been searching for hours on how to download a file the documentation shows me how to do this; but cygwin is horrible and an annoyance to use and I'm trying to implement this in Python 3 for a program. I've tried to use urllib, requests, wget(in python), httplib and some other. But it only fetched the redirected page (as you would get if you paste the link in the url bar with the properly formatted url.)
Though when I inspect a page and I trigger the download link that has the same address that I tried, it works properly and provide me with a download pop-up. Here is an example page the link is triggered by clicking "Download data"
I don't get how any python package is unable to send the proper get request and that I would need to implement this program in linux only to be able to use 'wget'.
Anyone has a clue on how to properly call the url?

You need to add &submit=Download+Data to the end of your URL to download the data. You can see this with the network tab of inspect element in google chrome. Hope I helped!

I think
from subprocess import call
def download(URL)
CMD = ['curl',url]
call(CMD)
to run this:
download('www.download.com/blah/bah/blah')
if you want to use this from the interpreter:
save as module.py
python -i /path/to/module.py
>>>download('www.download.com/blah/bah/blah')
p.s. if this works i'll prob use this in my shell program
EDIT: my comment:
I tried this and got "malformed url" error
from subprocess import call
def download(FILE,URL):
#FILE = file to save to
#URL - download from here
CMD = ['curl','-o',FILE,URL]
call(CMD)
this is what i do for all system commands from python so its something to do with curl specifically.

Related

How to download a file that takes 5 seconds to finish with python?

I am trying to write some code that would download a file. Now this file from this website specifically, once you go on to that link, it takes 5 seconds for it to actually prompt the download, for example: https://sourceforge.net/projects/esp32-s2-mini/files/latest/download
I have tried using the obvious methods, such as wget.download and urllib.request.urlretrieve
urllib.request.urlretrieve('https://sourceforge.net/projects/esp32-s2-mini/files/latest/download', 'zzz')
get.download('https://sourceforge.net/projects/esp32-s2-mini/files/latest/download', 'zzzdasdas')
However, that does not work, it downloads something else, but not what I want it to.
Any suggestions would be great.
Using chrome's download page (ctrl+j should open it, or just click "Show All" when downloading a file), we can see all of our recent downloads. The link you provided is just the page that begins the download, not the location of the actual file itself. Right-clicking the blue name lets us copy the address to the actual file being downloaded.
The actual link of the file, in this case, is https://cfhcable.dl.sourceforge.net/project/esp32-s2-mini/ToolFlasher/NodeMCU-PyFlasher-3.0-x64.exe
We can then make a GET request to download the file. Testing this out with bash wget downloads the file properly.
wget https://versaweb.dl.sourceforge.net/project/esp32-s2-mini/ToolFlasher/NodeMCU-PyFlasher-3.0-x64.exe
You can, of course, use python requests to accomplish this as well.
import requests
response = requests.get(r"https://cfhcable.dl.sourceforge.net/project/esp32-s2-mini/ToolFlasher/NodeMCU-PyFlasher-3.0-x64.exe")
with open("NodeMCU-PyFlasher-3.0-x64.exe", "wb") as f:
f.write(response.content)
Note that we are using wb (write bytes) mode instead of the default w (write).

How can I automate downloads using IDM?

I want to automate downloading using selenium python which in turn carries the link to IDM. However, the thing is I can't get to download using IDM.
Thisis not good practice in selenium automation
Whilst it is possible to start a download by clicking a link with a browser under Selenium’s control, the API does not expose download progress, making it less than ideal for testing downloaded files. This is because downloading files is not considered an important aspect of emulating user interaction with the web platform. Instead, find the link using Selenium (and any required cookies) and pass it to a HTTP request library like libcurl.
Please refer seleniumhq site
This is the syntax to run IDM in Python. It will download to the default local path. Use an additional parameter '/p' to change the local path if needed.
from subprocess import run
idm_path = "C:\Program Files (x86)\Internet Download Manager\idman.exe"
url = "example url"
filename = "song.mp3"
run([idm_path, '/d', url, '/f', filename, '/n'])
source: Start IDM download from command line.

Any way to run web scripts from html, that browser runs automaticaly on page download?

I'm practicing in parsing web pages with python. So what I do is
ans = requests.get(link)
Then I use re to extract some information from html, that is stored in
ans.content
What I faced is that some sites use scripts, that are automatically executed in a browser, but not when I try to download a page using requests. For example, instead of getting a page with information I get something like
scripts_to_get_info.run()
in html code
Browser is installed on my computer, so as a program that I wrote, this means that, theoretically, I should have a way to run this script and to get information while running python code to parse then.
Is it possible? Any suggestion?
(idea, that this is doable, came from the fact, that when I tried to inspect page in google, I saw real html file without any trashy scripts)

Connecting HTML with Python

I tried to link my HTML file with Python code.
I tried this
import webbrowser
webbrowser.open_new_tab("data.HTML")
It returned my HTML page in Firefox..
But i need to return to my Python program to execute remaining lines
But when I closed this browser, it closes my Python script too.
And I tried to link my Python program by,
go to Python
it returns to text editor not to terminal...
But I need to return to terminal.
I need the solution
As someone described, you need to use one web framework (like flask, django others) to run python code. Or the second solution is using CGI(http://modwsgi.readthedocs.io/en/develop/).
For the second problem(want to keep running python code after browser is closed), I want to advice to use Selenium.
Cheers, John

Script to download website source to a folder

I am trying to learn simple automation. I have set up an Ubuntu Server and I want to configure it to download html source from a specific URL and append to a file in a specified folder on the server every 1 minute.
The URL is just basic html with no CSS whatsoever.
I want to use python but admittedly can use any language. What is a good, simple day to do this?
Jeff's answer works for a one time use.
You could do this to run it repeatedly-
import time
import requests
while True:
with open('filename.extension', 'a') as fp:
newHtml = requests.get('url').text
fp.write(newHtml)
time.sleep(60)
You could run this as a background process for as long as you want.
$ python3 script_name.py &
Just pip install the requests library.
$ pip install requests
Then, it's super easy to get the HTML (put this in a file called get_html.py, or whatever name you like):
import requests
req = requests.get('http://docs.python-requests.org/en/latest/user/quickstart/')
print(req.text)
There are a variety of options for saving the HTML to a directory. For example, you could redirect the output from the above script to a file by calling it like this:
python get_html.py > file.html
Hope this helps

Categories