How to download a file in Python? - python

So the issue I am having isn't that there is a link of a PDF on the web I am trying to scrape and download onto my PC (It doesn't end in .pdf). I have a download link that I want to activate, which would then lead me to download a PDF onto my computer. It looks like this:
https://***.com/files/4122109/download?download_frd=1&verifier=xxx
When I click the link, it verifies I am the user that I am, and then lets me download the file with the ID contained in the above query. The content-type for this file is "application/pdf" so I know it downloads a PDF file for me. I just need a library that "clicks" or "activates" the download for me.
Also, I am trying to do this for all the URLs I am pulling from a course on Canvas in a GET request. I am not trying to use Selenium here because I am getting these URLs from an API. Any advice in this approach would be highly appreciated.

Related

Not able to scrape the response from clicking on a Button using Python Scrapy

Thank you all wonderful out there for reading this post and their help
For below URL, I have been trying to understand how to go about getting excel files which are downloaded after clicking on "Download Data" hyperlink. On inspecting this element, i get something like this "::before". Not sure what this is.
https://www.moneycontrol.com/mutual-funds/find-fund/returns?&amc=AXMF&EXCLUDE_FIXED_MATURITY_PLANS=Y
I have downloaded files, in somewhat similar cases in the past, where such buttons contain URL directing to the file. I had to then make use of request library to get a bytes response which downloads the file in my local.
However, in this case, i am not able to find the URL to send response to.
Cheers,
Aakash

Download a .xls file from a aspx page using Python

I'm trying to download a .xls from this Site
I need to somehow click on the second button("Exporta informácion diária") on the grid and download the .xls file.
I tried with requests and beautifulsoup but didnt work.
After that, tried with selenium just for some tests and i managed to do what i needed.
Can someone please explain how can i download the .xls file without using a headless browser?
Thank You.
To do this, you first need to understand what the flow of network requests that performs the download.
The easiest way is to open the developer tools in the browser you are using. And follow the appropriate requests.
In your case, there is an POST Request, Which returns the exact address to the file.
Download it with a GET request.

urllib.urlretrieve() cannot download the file

Hi there's a button in the web, if you click it, it'll download a file.
Say the corresponding url is like this
http://www.mydata.com/data/filedownload.aspx?e=MyArgu1&k=kfhk22wykq
If i put this url in the address bar in the browser, it can download the file as well properly.
Now i do this in the python,
urllib.urlretrieve(url, "myData.csv")
The csv file is empty. Any suggestions please ?
This may not be possible with every website. If a link has a token then python is unlikely to be able to use the link as it is tied to your browser.

Download file from webpage which does not have a download link

I am trying to download this excel file using Python.
http://www.bseindia.com/markets/equity/EQReports/MarketWatch.aspx?expandable=2. The excel file is on the right side in the box which says "Top Turnovers - All Market".
I am not an HTML expert but usually all files embedded in web I see has a download link (when I rightclick on download button). This one is just an image of excel icon with no pointer to the download link. However, when you click on it a file is downloaded. This could be a common HTML feature but I am not able to figure it out where the file is located. Even the source code is pointing out to icon image.
However my end goal is to able to download this file through python. I thought I could use beautifulsoup and with my limited knowledge on that I think I need to point to a download link. In this case I do not have one. So is there some other way to do it? May be I am missing something basic but any help on how to download this file would be great. I am not looking for a full code or even a working code. Just some pointers on how to go about it and which package to use. I can find my way once I know what I am suppose to use.
The task of clicking we can do it through the javascript, for this use selenium and the chromedriver.
Code:
from selenium import webdriver
chromedriver = '/usr/bin/chromedriver'
url = "http://www.bseindia.com/markets/equity/EQReports/MarketWatch.aspx?expandable=2"
chrome = webdriver.Chrome(chromedriver)
chrome.get(url)
chrome.execute_script("document.getElementById('ctl00_ContentPlaceHolder1_imgDownload').click();")

Python write web scraper to get pdf files without several logins

How can I download all the pdf (or specific extension files like .tif or .pdf) from a webpage that requires login. I dont want to log in everytime for every pdf so I cant use link generation and pushing to browser scheme
The solution was simple: just posting it for others may have the same question
mydriver.get("https://username:password#www.somewebsite.com/somelink")

Categories