Download image from website using chrome - python

I want to download some images (<10000) from a website.
I can't directly use the following python to directly download since the website requires username/password, it will give a 'HTTP Error 401: Unauthorized' error.
f = open(output_path, 'wb')
f.write(request.urlopen(full_image_url).read())
f.close()
So my current work around is to
log in to the website in Chrome - Easily Done
use python to parse (I manually copied the page source) and open lots of image links in chrome using
webbrowser.get(chrome_path).open(full_image_url) - This is Done
save the image from browser to local hard drive.
For step 3, I can manually right click each chrome tab to 'save image as'. But is there an automatic way to do so?
Any suggestion, help link, or other work around solution would be appreciated.

Have you tried logging into the site using python?
Take a look at this: How can I login to a website with Python?

Related

Download file when hit in the browser's url

I currently have a link to pdf and doc which is hosted on my server. Using selenium i opened headless chrome and hit the docx link and the document got downloaded but pdf is not getting downloaded..reason being pdf is being viewed in the browser instead of getting downloaded.
Example :- When you click this link. you can view the image.
Is there any attribute kind of parameter which i can append to the url and the pdf (or) image gets downloaded?
One thing you could do is modify the browser settings first to force PDF downloads, for instance by opening chrome://settings/content/pdfDocuments and then pressing the #knob button to activate this setting.
That I assume you're using Chrome obviously but there may be analogous settings in other browsers.

Download a .xls file from a aspx page using Python

I'm trying to download a .xls from this Site
I need to somehow click on the second button("Exporta informácion diária") on the grid and download the .xls file.
I tried with requests and beautifulsoup but didnt work.
After that, tried with selenium just for some tests and i managed to do what i needed.
Can someone please explain how can i download the .xls file without using a headless browser?
Thank You.
To do this, you first need to understand what the flow of network requests that performs the download.
The easiest way is to open the developer tools in the browser you are using. And follow the appropriate requests.
In your case, there is an POST Request, Which returns the exact address to the file.
Download it with a GET request.

Download file from webpage which does not have a download link

I am trying to download this excel file using Python.
http://www.bseindia.com/markets/equity/EQReports/MarketWatch.aspx?expandable=2. The excel file is on the right side in the box which says "Top Turnovers - All Market".
I am not an HTML expert but usually all files embedded in web I see has a download link (when I rightclick on download button). This one is just an image of excel icon with no pointer to the download link. However, when you click on it a file is downloaded. This could be a common HTML feature but I am not able to figure it out where the file is located. Even the source code is pointing out to icon image.
However my end goal is to able to download this file through python. I thought I could use beautifulsoup and with my limited knowledge on that I think I need to point to a download link. In this case I do not have one. So is there some other way to do it? May be I am missing something basic but any help on how to download this file would be great. I am not looking for a full code or even a working code. Just some pointers on how to go about it and which package to use. I can find my way once I know what I am suppose to use.
The task of clicking we can do it through the javascript, for this use selenium and the chromedriver.
Code:
from selenium import webdriver
chromedriver = '/usr/bin/chromedriver'
url = "http://www.bseindia.com/markets/equity/EQReports/MarketWatch.aspx?expandable=2"
chrome = webdriver.Chrome(chromedriver)
chrome.get(url)
chrome.execute_script("document.getElementById('ctl00_ContentPlaceHolder1_imgDownload').click();")

Selenium-Phantomjs download csv

I am trying trying to download (save to disk) a CSV file using PhantomJS, from a dialogue box. Using firefox profile, this would be fairly simple by setting the browser profile properties. Any suggestions how could excel file be downloaded in phantomjs?
This is how it would be done using firefox driver:
profile = webdriver.firefox.firefox_profile.FirefoxProfile()
profile.set_preference("browser.download.folderList",2)
profile.set_preference("browser.download.dir",self.opts['output_dir'])
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', ('application/octet-stream,application/msexcel'))
I am using Phantomjs driver:
webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true','--local-storage-path=/tmp'])
and looking for a way to set properties which can override save to disk and set MIME type of the data. Currently without having the properties set, PhantomJS driver, does not download the file.
I have read links about avoiding dialog box etc but in this case, it is needed.
I was recently struggling with a similar issue. However I ended up switching the web driver because it offers the ability to access network traffic relatively easily. This means that if a file is not directly on the page and is rather transferred in you cannot see it in phantom. There are a few people working on work arounds but I found that most of my files were being transferred and thus it was easier for me to gather network traffic with web driver + firebug + net export.
However in phantomjs very hacky way to do this would be something like this:
phantomjs.exe file_to_run.js > my_log.txt
Where you simply save the console contents to file. However you are likely to get errors and other messages in your file. You could clean it since you're only looking for cdv.
From my understanding PhantomJS is limited as the developer has a very specific idea for how it should be. For example they discontinued supporting flash. There is not a easy native way of downloading and saving files like the way you can in firefox. You could launch another web browser and download through that. However I think the easiest way to do this is to use CasperJS which plays nicely with PhantomJS.
A good example of using casperJS to download files can be found here: casperjs download csv file
I believe that the major issue with using casper however is that large files are not well supported. Is there a specific reason you are preferring to use a headless browser?

Grabbing a .jsp generated PNG in Python

I am trying to grab a PNG image which is being dynamically generated with JSP in a web service.
I have tried visiting the web page it is contained in and grabbing the image src attribute; but the link leads to a .jsp file. Reading the response with urllib2 just shows a lot of gibberish.
I also need to do this while logged into the web service in question, using mechanize. This seems to exclude the option of grabbing a screenshot with webkit2png or similar.
Thanks for any suggestions.
If you use urllib correctly (for example, making sure your User-Agent resembles a browser etc), the "gibberish" you get back is the actual file, so you just need to write it out to disk (open the file with "wb" for writing in binary mode) and re-read it with some image-manipulation library if you need to play with it. Or you can use urlretrieve to save it directly on the filesystem.
If that's a jsp, chances are that it takes parameters, which might be appended by the browser via javascript before the request is done; you should look at the real request your browser makes, before trying to reproduce it. You can do that with the Chrome Developer Tools, Firefox LiveHTTPHeaders, etc etc.
I do hope you're not trying to break a captcha.

Categories