Download file from webpage which does not have a download link

Download file from webpage which does not have a download link - python

I am trying to download this excel file using Python.
http://www.bseindia.com/markets/equity/EQReports/MarketWatch.aspx?expandable=2. The excel file is on the right side in the box which says "Top Turnovers - All Market".
I am not an HTML expert but usually all files embedded in web I see has a download link (when I rightclick on download button). This one is just an image of excel icon with no pointer to the download link. However, when you click on it a file is downloaded. This could be a common HTML feature but I am not able to figure it out where the file is located. Even the source code is pointing out to icon image.
However my end goal is to able to download this file through python. I thought I could use beautifulsoup and with my limited knowledge on that I think I need to point to a download link. In this case I do not have one. So is there some other way to do it? May be I am missing something basic but any help on how to download this file would be great. I am not looking for a full code or even a working code. Just some pointers on how to go about it and which package to use. I can find my way once I know what I am suppose to use.

The task of clicking we can do it through the javascript, for this use selenium and the chromedriver.
Code:
from selenium import webdriver
chromedriver = '/usr/bin/chromedriver'
url = "http://www.bseindia.com/markets/equity/EQReports/MarketWatch.aspx?expandable=2"
chrome = webdriver.Chrome(chromedriver)
chrome.get(url)
chrome.execute_script("document.getElementById('ctl00_ContentPlaceHolder1_imgDownload').click();")

Related

Open File after navigating through website using Selenium

I'm at the final stage of this small project using Python/Selenium to navigate through a platform I use to bulk upload csv files.
I can't seem to find a way to specify the path/file location to then select (open) the file to upload to the website. I am trying to use the code below:
filepath = filedialog.askopenfilenames(initialdir="C:\\Users\\MyFolder")
subprocess.Popen('explorer "C:\my_file.csv"')
How do I select the correct file to then click 'open' after navigating through the website? Any help would be greatly appreciated. Thanks so much in advance.

So I found a work around to this issue. I used the below code to locate the input element instead:
s = driver.find_element(By.XPATH, "//input[#type='file']")
#file path specified with send_keys
s.send_keys(r"C:\Users\MyFolder\my_file.csv")
Here are some resources I used:
Selenium Documentation
Upload Files With Python
Handle Windows File Upload Using Selenium - Java
Thanks to anyone who wanted to help!

How to download a file in Python?

So the issue I am having isn't that there is a link of a PDF on the web I am trying to scrape and download onto my PC (It doesn't end in .pdf). I have a download link that I want to activate, which would then lead me to download a PDF onto my computer. It looks like this:
https://***.com/files/4122109/download?download_frd=1&verifier=xxx
When I click the link, it verifies I am the user that I am, and then lets me download the file with the ID contained in the above query. The content-type for this file is "application/pdf" so I know it downloads a PDF file for me. I just need a library that "clicks" or "activates" the download for me.
Also, I am trying to do this for all the URLs I am pulling from a course on Canvas in a GET request. I am not trying to use Selenium here because I am getting these URLs from an API. Any advice in this approach would be highly appreciated.

What code can I use to download a csv file that requires some steps after logging in to the website?

After some research on my problem, it seems I should use either requests or urllib or both.
So basically, I am trying to learn the code I need to download a csv file from this url:
https://globalaccess.sustainalytics.com/#/tools/0
The way I manually download my files is as follows: first, I need to log in using username and password. Next I have to go to a tab called "Screening" that takes to me another page that has several buttons called "Generate". I click a specific generate button (it's always the same one) among the option to get the excel file. After that I have the option to save the file or open from a little window within the website.
My question is what code can I use on Python to download and save the file in a particular folder?

Use Selenium
https://selenium-python.readthedocs.io/
You'll need to download a 'chromedriver' to the same directory as your python script, then use the intro tutorial on the selenium docs site to drive the browser to type/click where you want.
If you use chrome you can right click on any given link/input box click inspect, then in the window that comes up right click the bit of highlighted code and 'copy xpath'. Use the find element by xpath function in Selenium to send keys or clicks to that element.

Download image from website using chrome

I want to download some images (<10000) from a website.
I can't directly use the following python to directly download since the website requires username/password, it will give a 'HTTP Error 401: Unauthorized' error.
f = open(output_path, 'wb')
f.write(request.urlopen(full_image_url).read())
f.close()
So my current work around is to
log in to the website in Chrome - Easily Done
use python to parse (I manually copied the page source) and open lots of image links in chrome using
webbrowser.get(chrome_path).open(full_image_url) - This is Done
save the image from browser to local hard drive.
For step 3, I can manually right click each chrome tab to 'save image as'. But is there an automatic way to do so?
Any suggestion, help link, or other work around solution would be appreciated.

Have you tried logging into the site using python?
Take a look at this: How can I login to a website with Python?

urllib.urlretrieve() cannot download the file

Hi there's a button in the web, if you click it, it'll download a file.
Say the corresponding url is like this
http://www.mydata.com/data/filedownload.aspx?e=MyArgu1&k=kfhk22wykq
If i put this url in the address bar in the browser, it can download the file as well properly.
Now i do this in the python,
urllib.urlretrieve(url, "myData.csv")
The csv file is empty. Any suggestions please ?

This may not be possible with every website. If a link has a token then python is unlikely to be able to use the link as it is tied to your browser.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Download file from webpage which does not have a download link - python

Related

Open File after navigating through website using Selenium

How to download a file in Python?

What code can I use to download a csv file that requires some steps after logging in to the website?

Download image from website using chrome

urllib.urlretrieve() cannot download the file

Categories

Resources