Webscraping site that downloads a file in Python - python

I'm trying to scrape a table from a website.
I obviously get the XPATH of the table to use in the Selenium driver - but the website already produces a file which we can download of the same table.
I click on an icon and it opens a SAVE AS dialog box.
Using selenium, how can I just download this file? How do I just save it directly?

According to older posts, some degree of input is emulated. (using pyautogui). Probably just an enter key on keyboard.
https://stackoverflow.com/a/58432097/19042045

Related

How to find the value of an XPATH with a tab already open with Python

I just got into web scraping but every question or tutorial haves you open a new tab with python. The website I would like to use is called nitro typer and I already know how to get the XPATH. The issue is I want to be able to get the value of that path using an already opened tab. I have only used selenium.

Download file when hit in the browser's url

I currently have a link to pdf and doc which is hosted on my server. Using selenium i opened headless chrome and hit the docx link and the document got downloaded but pdf is not getting downloaded..reason being pdf is being viewed in the browser instead of getting downloaded.
Example :- When you click this link. you can view the image.
Is there any attribute kind of parameter which i can append to the url and the pdf (or) image gets downloaded?
One thing you could do is modify the browser settings first to force PDF downloads, for instance by opening chrome://settings/content/pdfDocuments and then pressing the #knob button to activate this setting.
That I assume you're using Chrome obviously but there may be analogous settings in other browsers.

What code can I use to download a csv file that requires some steps after logging in to the website?

After some research on my problem, it seems I should use either requests or urllib or both.
So basically, I am trying to learn the code I need to download a csv file from this url:
https://globalaccess.sustainalytics.com/#/tools/0
The way I manually download my files is as follows: first, I need to log in using username and password. Next I have to go to a tab called "Screening" that takes to me another page that has several buttons called "Generate". I click a specific generate button (it's always the same one) among the option to get the excel file. After that I have the option to save the file or open from a little window within the website.
My question is what code can I use on Python to download and save the file in a particular folder?
Use Selenium
https://selenium-python.readthedocs.io/
You'll need to download a 'chromedriver' to the same directory as your python script, then use the intro tutorial on the selenium docs site to drive the browser to type/click where you want.
If you use chrome you can right click on any given link/input box click inspect, then in the window that comes up right click the bit of highlighted code and 'copy xpath'. Use the find element by xpath function in Selenium to send keys or clicks to that element.

Download file from webpage which does not have a download link

I am trying to download this excel file using Python.
http://www.bseindia.com/markets/equity/EQReports/MarketWatch.aspx?expandable=2. The excel file is on the right side in the box which says "Top Turnovers - All Market".
I am not an HTML expert but usually all files embedded in web I see has a download link (when I rightclick on download button). This one is just an image of excel icon with no pointer to the download link. However, when you click on it a file is downloaded. This could be a common HTML feature but I am not able to figure it out where the file is located. Even the source code is pointing out to icon image.
However my end goal is to able to download this file through python. I thought I could use beautifulsoup and with my limited knowledge on that I think I need to point to a download link. In this case I do not have one. So is there some other way to do it? May be I am missing something basic but any help on how to download this file would be great. I am not looking for a full code or even a working code. Just some pointers on how to go about it and which package to use. I can find my way once I know what I am suppose to use.
The task of clicking we can do it through the javascript, for this use selenium and the chromedriver.
Code:
from selenium import webdriver
chromedriver = '/usr/bin/chromedriver'
url = "http://www.bseindia.com/markets/equity/EQReports/MarketWatch.aspx?expandable=2"
chrome = webdriver.Chrome(chromedriver)
chrome.get(url)
chrome.execute_script("document.getElementById('ctl00_ContentPlaceHolder1_imgDownload').click();")

selenium popup save file issue using python

I am working on a html with selenium. After clicking the last link, pop up comes which says as save a file.
using selenium I am recording all the events and then generating the selenium RC script.
I want to know that how to get the pop up file from code using python?
In the case of saving a file, you can get around the popup box by configuring the options of your browser profile. See this answer for an explanation using Firefox. General idea is that you need to tell Firefox itself to not prompt when saving files of certain types. Note that this will result in the file being saved somewhere, but you can also control where it goes in case you want to delete the file (or handle it separately in Python).
Webdriver cannot communicate with the browser modal popup.
But this can be done, check out the below link for your answer
http://blog.codecentric.de/en/2010/07/file-downloads-with-selenium-mission-impossible/

Categories