I have a .csv file with a list of URLs I need to extract data from. I need to automate the following process: (1) Go to a URL in the file. (2) Click the chrome extension that will redirect me to another page which displays some of the URL's stats. (3) Click the link in the stats page that enables me to download the data as a .csv file. (4) Save the .csv. (5) Repeat for the next n URLs.
Any idea how to do this? Any help greatly appreciated!
There is a python package called mechanize. It helps you automate the processes that can be done on a browser. So check it out.I think mechanize should give you all the tools required to solve the problem.
Related
I am trying to read several excel files available on this website https://www.motilaloswalmf.com/download/month-end-portfolio, using python's request library. However I am not able to figure out the exact url for downloading excels through network tab.
Can someone please help with it ? Thank you !
The given site needs JS rendering to extract the excel file links. You need selenium or playwright to achieve your goal.
I am currently trying to write a python script that will open my companies inventory system which is a link in google chrome, sign in, and then click the save as excel button that is posted on top of a specific page. This will hopefully automate the process of opening the link, navigating over to the tab, clicking export, then exporting this data daily.
Any idea of where to start? I was thinking maybe can get this done using web scraping but not sure with the log in details needed. Also, how can I export this file once in? Just need some ideas to start me on this journey. Any and all help is appreciated!
Simply start with selenium python automation
Divide you whole process in smaller tasks and write python code
for that each task:)
So the issue I am having isn't that there is a link of a PDF on the web I am trying to scrape and download onto my PC (It doesn't end in .pdf). I have a download link that I want to activate, which would then lead me to download a PDF onto my computer. It looks like this:
https://***.com/files/4122109/download?download_frd=1&verifier=xxx
When I click the link, it verifies I am the user that I am, and then lets me download the file with the ID contained in the above query. The content-type for this file is "application/pdf" so I know it downloads a PDF file for me. I just need a library that "clicks" or "activates" the download for me.
Also, I am trying to do this for all the URLs I am pulling from a course on Canvas in a GET request. I am not trying to use Selenium here because I am getting these URLs from an API. Any advice in this approach would be highly appreciated.
I'm trying to download a .xls from this Site
I need to somehow click on the second button("Exporta informácion diária") on the grid and download the .xls file.
I tried with requests and beautifulsoup but didnt work.
After that, tried with selenium just for some tests and i managed to do what i needed.
Can someone please explain how can i download the .xls file without using a headless browser?
Thank You.
To do this, you first need to understand what the flow of network requests that performs the download.
The easiest way is to open the developer tools in the browser you are using. And follow the appropriate requests.
In your case, there is an POST Request, Which returns the exact address to the file.
Download it with a GET request.
How can I download all the pdf (or specific extension files like .tif or .pdf) from a webpage that requires login. I dont want to log in everytime for every pdf so I cant use link generation and pushing to browser scheme
The solution was simple: just posting it for others may have the same question
mydriver.get("https://username:password#www.somewebsite.com/somelink")