Download a .xls file from a aspx page using Python - python

I'm trying to download a .xls from this Site
I need to somehow click on the second button("Exporta informácion diária") on the grid and download the .xls file.
I tried with requests and beautifulsoup but didnt work.
After that, tried with selenium just for some tests and i managed to do what i needed.
Can someone please explain how can i download the .xls file without using a headless browser?
Thank You.

To do this, you first need to understand what the flow of network requests that performs the download.
The easiest way is to open the developer tools in the browser you are using. And follow the appropriate requests.
In your case, there is an POST Request, Which returns the exact address to the file.
Download it with a GET request.

Related

How to read excel file into python from a particular website

I am trying to read several excel files available on this website https://www.motilaloswalmf.com/download/month-end-portfolio, using python's request library. However I am not able to figure out the exact url for downloading excels through network tab.
Can someone please help with it ? Thank you !
The given site needs JS rendering to extract the excel file links. You need selenium or playwright to achieve your goal.

How to download a file in Python?

So the issue I am having isn't that there is a link of a PDF on the web I am trying to scrape and download onto my PC (It doesn't end in .pdf). I have a download link that I want to activate, which would then lead me to download a PDF onto my computer. It looks like this:
https://***.com/files/4122109/download?download_frd=1&verifier=xxx
When I click the link, it verifies I am the user that I am, and then lets me download the file with the ID contained in the above query. The content-type for this file is "application/pdf" so I know it downloads a PDF file for me. I just need a library that "clicks" or "activates" the download for me.
Also, I am trying to do this for all the URLs I am pulling from a course on Canvas in a GET request. I am not trying to use Selenium here because I am getting these URLs from an API. Any advice in this approach would be highly appreciated.

Download image from website using chrome

I want to download some images (<10000) from a website.
I can't directly use the following python to directly download since the website requires username/password, it will give a 'HTTP Error 401: Unauthorized' error.
f = open(output_path, 'wb')
f.write(request.urlopen(full_image_url).read())
f.close()
So my current work around is to
log in to the website in Chrome - Easily Done
use python to parse (I manually copied the page source) and open lots of image links in chrome using
webbrowser.get(chrome_path).open(full_image_url) - This is Done
save the image from browser to local hard drive.
For step 3, I can manually right click each chrome tab to 'save image as'. But is there an automatic way to do so?
Any suggestion, help link, or other work around solution would be appreciated.
Have you tried logging into the site using python?
Take a look at this: How can I login to a website with Python?

python mechanize blank download or how to do it in casperjs

I am downloading information for a research project from a site that uses ajax to load URLs and does not allow serial downloading. I am dumping the urls from casperjs into a file I read and use browser.retrieve(url,dump_filename) to download the information with mechanize. I mostly get blank file downloads but they are periodically filled with content. Is there a way to modify the headers so that I can always get data. Also, a casperjs download alternative is welcome. I have tried casperjs download() but it saves a blank file as well. I think it has something to do with the headers. File downloads always work in a browser.
I prefer Selenium over Mechanize when it comes to more "sophisticated" web-sites, that use AJAX, JS, etc.
You said downloading works, when you're using your browser. Well Selenium does the same thing - it uses Firefox on your desktop to fulfill its tasks

Grabbing a .jsp generated PNG in Python

I am trying to grab a PNG image which is being dynamically generated with JSP in a web service.
I have tried visiting the web page it is contained in and grabbing the image src attribute; but the link leads to a .jsp file. Reading the response with urllib2 just shows a lot of gibberish.
I also need to do this while logged into the web service in question, using mechanize. This seems to exclude the option of grabbing a screenshot with webkit2png or similar.
Thanks for any suggestions.
If you use urllib correctly (for example, making sure your User-Agent resembles a browser etc), the "gibberish" you get back is the actual file, so you just need to write it out to disk (open the file with "wb" for writing in binary mode) and re-read it with some image-manipulation library if you need to play with it. Or you can use urlretrieve to save it directly on the filesystem.
If that's a jsp, chances are that it takes parameters, which might be appended by the browser via javascript before the request is done; you should look at the real request your browser makes, before trying to reproduce it. You can do that with the Chrome Developer Tools, Firefox LiveHTTPHeaders, etc etc.
I do hope you're not trying to break a captcha.

Categories