Download HAR file using script from Chrome - python

I want to download the HAR file containing all the resources information that the browser downloads for a particular site. Currently, I am opening a webpage on Chrome through Python Script. Now, I want my script to automatically download HAR file after Complete Webpage is loaded.
I have already searched through many sources, but didn't find anything useful.
Any help will be appreciated.

Related

TIFF File Download Automation in EarthExplorer

I'm working on a project to download cropped tiff files for the following area from this website https://earthexplorer.usgs.gov/.
The crop will generate 417 datasets (tiff files) which are divided into 10 pages and ready to be downloaded by clicking the download icon in the following image.
Then a choice of file formats to download will appear and I choose GeoTIFF 1 Arc-second Download.
The website requires login to download the files.
Because there are too many files that I have to download, I want to automate the download process. I have used selenium and chrome web driver in python to do this task. But due to limited information I have not been able to complete it.
Any suggestions on how to finish this job? Or is there a similar project reference? Your information means a lot to me. Thank you

Download Opened PDF with Python

I'm working a program that will utilize Selenium/Webdriver to open a webpage, enter some data, and open a new page which is a PDF. Ultimately I would like to download that PDF into a folder. I know it is possible to download a PDF into a folder if you have the URL in your script, but I'm struggling to find a way to download it if it is opened within the program.
A) Is there a way to download a PDF that is opened explicitly in Chrome using a script?
B) Is there a way to extract the URL from an opened webpage that then be fed back into the program to download from?
While I was doing a selenium project, I faced a similar issue.
I would click on a link that was referring to a PDF file but instead of downloading, the selenium chromedriver would just open it in a new tab.
What solved my problem was that right after I started new chromedriver session, I manually disabled this feature:
In your Chrome settings, go to Privacy and Securtiy
Select Site Settings
Scroll down and click on Additional preferences
Find a section named 'PDF documents'
Turn on the option that says "Download PDF files instead of automatically opening them in Chrome"
Now, any PDF link you click on will download the file instead of opening them in a new tab. Note that you need to do this every time you start a new chromedriver. Changing this setting in your main Chrome application won't help.

How to download a FULL HTML Google Drive folder page into a variable?

I can not download the complete HTML code from the Google Drive folder to find the ID code for downloading public files from this Google folder. If I open the site and download it through the Mozilla Firefox browser, then it's all in the HTML code. The link to the google drive folder is in the example code below. Everything as an unregistered Google user. These are public files and public folders.
The file, which I know to crawl through the downloaded Mozilla Firefox html code, but not through WGET or Python, has the name:
piconwhite-220x132-freeSAT..........(insignificant remaining part of file name)
Here is an example of the Python algorithm what I use, but where nothing is obvious (urllib2 module):
import urllib2
u_handle = urllib2.urlopen('https://drive.google.com/drive/folders/0Bwz6mBA7lUOKZi1nbGdlbzFDZ0U')
htmlPage = u_handle.read()
with open('/tmp/test.html','w') as f:
f.write(htmlPage)
If I download a html page using a web browser, the html file size is about 500kB and also contains the above mentioned file to uncover the download code. If I download the webpage through wget or through the Python urllib2 module, the downloaded html code has a size of only 213kB and does not contain the mentioned file.
BTW, I tried several WGET methods (via linux shell - command line) but there is the same situation - that is, always downloading HTML with a certain number of maximum files from the content (unfortunately, not all files there).
Thank you for all the advice.
P.S.
I'm not a good web developer and I'm looking for a solution to the problem. I'm a developer in other languages and on other platforms.
So, I resolved my own problem by downloading a different drive.google webpage as a shortened form of directory / file list. I use this new URL:
'https://drive.google.com/embeddedfolderview?id=0Bwz6mBA7lUOKZi1nbGdlbzFDZ0U#list'
Instead of the previous URL:
'https://drive.google.com/drive/folders/0Bwz6mBA7lUOKZi1nbGdlbzFDZ0U'
The source code of the "list" site is slightly different, but it has a lot of records (lots of directories or files on drive.google page). So I can see all the files or all the directories that are on the required drive.google website.
Thank you all for helping me or for reading my problem.

How to prevent or manage download while using driver.get in Selenium

I have some urls, which include xml code, but are not named .xml which I want to open with chrome driver + selenium. Everytime I try Chrome downloads the files.
A download is fine, but I somehow need to name the downloads in a specific way.
The other option would be to prevent the download and than save later or work with the files somehow before I save them somewhere.
How can I name or even better prevent the download and get just the content of any URL called
I use python

Python program to download music files

I am learning python, I am building a small tool to download music files through python. I have two questions.
The following webpage has three download links. http://mp3monkey.net/audiod/147455106/186823954/Zeds_Dead_-Demons_.mp3
If I click on the second one (green), an mp3 file gets downloaded on my computer.
However, that download link points to the following link. http://mp3monkey.net/audio/147455106/186823954/Zeds_Dead_-Demons_.mp3?dl=2
If I try to open that link on a separate tab, it does not work, the webpage says "Hotlink Protection. Visit our Website Directly to Download the Song".
What is happening? Why clicking directly on the download button downloads the file while open the same link on a new tab is unable to download it?
I was reading the following post
How do I download a file over HTTP using Python?
This method does not work on the above link. Any idea why?
import urllib
urllib.urlretrieve("second link", "test.mp3")
This downloads a corrupt file of size 11kb.

Categories