Running webscraping Python file with HTML trigger on Azure - python

I currently am webscraping a website once per day to track changes through time, i wanted to move my project to an Azure cloud so I dont need to keep my PC on during the time of the scrape.
I've been exploring options and decided to use an Azure Function App, specifically an HTML trigger.
I've gotten my python file loaded in Visual Studio Code and linked it to my Azure portal, however I am now stuck on the actual step of running the python file that starts the scrape from the HTML trigger, as well as the process of storing those results in an Azure Table output binding that I have created.
Can anyone point me in the right direction?
Thanks in Advance!

Related

Exporting Excel Data from Webpage

I am currently trying to write a python script that will open my companies inventory system which is a link in google chrome, sign in, and then click the save as excel button that is posted on top of a specific page. This will hopefully automate the process of opening the link, navigating over to the tab, clicking export, then exporting this data daily.
Any idea of where to start? I was thinking maybe can get this done using web scraping but not sure with the log in details needed. Also, how can I export this file once in? Just need some ideas to start me on this journey. Any and all help is appreciated!
Simply start with selenium python automation
Divide you whole process in smaller tasks and write python code
for that each task:)

Deploy Scraping Scripts in python

Hello I have a selenium script in python which extract data with login on webpage. It take around 50 sec to execute and I want to deploy that script as an api. But API is getting timeout.
There we can also do one thing that we save that data in some google sheet using that script.
Please anyone can suggest how can i do this or any relevant content ?
Could you provide us a screenshot of API timeout or logs? Showing Python code with requests will be also helpful (sorry for answering instead of commenting because I don't have enough reputation points)

Can a Python + R file share a webdriver session between the languages?

I am working on a scraper built in RSelenium. A number of tasks are more easily accomplished using Python, so I've set up a .Rmd file with access to R and Python code chunks.
The R-side of the scraper opens a website in Chrome, logs in, and accesses and scrapes various pages behind the login wall. (This is being done with permission of the website owners, who would rather users scrape the data ourselves than put together a downloadable.)
I also need to download files from these pages, a task which I keep trying in RSelenium but repeatedly come back to Python solutions.
I don't want to take the time to rewrite the code in Python, as it's fairly robust, but my attempts to use Python result in opening a new driver, which starts a new session no longer logged in. Is there a way to have Python code chunks access an existing driver / session being driven by RSelenium?
(I will open a separate question with my RSelenium download issues if this solution doesn't pan out.)
As far as I can tell, and with help from user Jortega, Selenium does not support interaction with already open browsers, and Python cannot access an existing session created via R.
My solution has been to rewrite the scraper using Python.

How do I call a python function from a dashboard?

I'm building a dashboard in Data Studio and want to add a 'button' like feature that basically calls/runs a python function that downloads some content locally(on the client machine). I have the python function almost ready.
The question I have is where do I host this function so that it is callable from Data Studio?
Ideally, I wanted to create a Cloud function that would host the python function and get triggered when the button is clicked. This would work till this point but, will not download content locally. What options do I have to accomplish this?
Currently it is not possible to trigger Cloud Function or something similar from the Data Studio Front end, especially when you are intending to download content locally.
The only thing I can think of is, you can create a custom connector which will call the Cloud Function (via URL trigger). Then use the connector to create a data source and attach that data source to a table or chart. That way, every time that page (with the table/chart) is refreshed, the connector will call the Cloud Function and retrieve the associated data.
A function can be triggered by HTTP, so you can make it publicly available and just drop a link to it in a dashboard.
Just add a gray square around this link and make it look like a button. You may want to check the box to open the URL in a new tab, so your dashboard is not closed when the link is clicked.

oauth2 authentication failing when called from google cloud function

I am trying to create a google cloud function that is called when a user clicks a button in google sheets. This function goes through the data and updates another sheet. I have been using pygsheets to do this but I would like to use google cloud functions so that anyone can update the sheet without emailing me to open a terminal and run the script.
My initial error message says that when i run pygsheets.authorize() that this method is trying to write in a read only file system. I have tried to make my own custom credentials but have gotten nowhere with them. I have linked the conversation I had with the maintainer of the pygsheet project.
I am self taught and thus would happily take any advice on how to resolve this issue. I would like to keep using pygsheets if possible as I have not seen any python google sheets tools on par with it. Please help this issue has stumped me for a couple of weeks.

Categories