How to control chromium in electron process via python script - python

I am quite new to electron and I couldn't find anything useful online. Hence, the below doubt:
Task I am trying to achieve:
User provides list of links where he/she wants to get data from.
Once the list is provided and button is clicked, I am trying to go through all the websites, scrape the content from them and provide it to user in an organized format.
To do this, I am using python as a backend wherein, I am passing the list of links to python script and scraping data from each link by first launching it via selenium.
In this process, there are two issues which I want to find solutions for:
Each time the link is launched via selenium, chrome/firefox browser loads the link which is visible to user.
Also, the link loads in an external browser instead of opening in the software itself.
Can anyone share the appropriate flow by which the link will be launched via chromium browser in electron.
Thanks

Related

opening a web browser and get url histories in python

I am trying to make a python gui application.
What I want to do is to open a web browser by clicking a button. (Tkinter)
When the web browser is opened, I do login.
After logging it, it will redirect to the page.
And that page url will consist of code as a param I need to use later in code.
I used webbrowser.open_new('') to open a web browser.
But the limitation was it is only for opening.. there was no way to get the final redirected url I need.
Is there a way I can use to open a web browser and do something on that page and finally get that final url?
I am using python.
There are a few main approaches for automating interactions with browsers:
Telling a program how and what to click, like a human would, sometimes using desktop OS automation tools like Applescript
Parse files that contain browser data (will vary browser to browser, here is Firefox)
Use a tool or library that relies on the WebDriver protocol (e.g. selenium, puppeteer)
Access the local SQLite database of the browser and run queries against it
Sounds like 3 is what you need, assuming you're not against bringing in a new dependency.

Can a Python + R file share a webdriver session between the languages?

I am working on a scraper built in RSelenium. A number of tasks are more easily accomplished using Python, so I've set up a .Rmd file with access to R and Python code chunks.
The R-side of the scraper opens a website in Chrome, logs in, and accesses and scrapes various pages behind the login wall. (This is being done with permission of the website owners, who would rather users scrape the data ourselves than put together a downloadable.)
I also need to download files from these pages, a task which I keep trying in RSelenium but repeatedly come back to Python solutions.
I don't want to take the time to rewrite the code in Python, as it's fairly robust, but my attempts to use Python result in opening a new driver, which starts a new session no longer logged in. Is there a way to have Python code chunks access an existing driver / session being driven by RSelenium?
(I will open a separate question with my RSelenium download issues if this solution doesn't pan out.)
As far as I can tell, and with help from user Jortega, Selenium does not support interaction with already open browsers, and Python cannot access an existing session created via R.
My solution has been to rewrite the scraper using Python.

Python screenshot especific tab each time it loads

The problem: I want to write a Python script that takes a screenshot of a website I have opened in a browser each time it loads.
The thing is that I have a website where there are like 300 exam questions which I can get through, try each one of them and I will have the correction when I submit my answer. I will not have access to this questionnaire after a certain date, but I want to keep the questions (which I could write down, but laziness is strong in me, and want to learn Python).
The "attempt": I thought of doing a simple Python script with imgkit to take the screenshots. I'm opened to other suggestions, as imgkit was the first thing I saw while looking for this, and the code looks plain and simple to me:
import imgkit
imgkit.from_url('http://webpage.com', 'out.jpg')
But I have to provide the url for each webpage, and that will be more tedious than taking a screenshot with OS features, thus I want to automatize it.
The questions:
There is a way to make Python monitor a browser tab and take a screenshot each time it reloads (that will be when a new question appears)?
Or maybe get the tab's URL to pass it to imgkit and take the screenshot.
Another thing that I saw is that imgkit can generate a "screenshot" from a HTML file. Can Python download the HTML code from a tab I have open in my browser?
Selenium is your friend here. It is a framework designed for testing but it will make what you want really easy.
Selenium allows you to spin-up a web browser and control it. So you can instruct it to go to the web address you want and then do things. Normally you would instruct it to click here, write in a form, etc.
In your case you only want it to open a certain address, take a screenshot, go the the next address and repeat.
Here you have a tutorial on how to do exactly what you want.
The specific code is:
from selenium import webdriver
#1. Get the driver to manage the web-browser you choose
driver = webdriver.Chrome()
#2. Go the the webadress you want
driver.get('https://python.org')
#3. Take a screenshot
driver.save_screenshot("screenshot.png")
driver.close()
PS: In order for the tutorial to run you will need to have installed the web driver for Selenium to be able to spin-up and run Chrome. Here are the instructions for that.

Programmatically access and modify.aspx page

I am working on a project which needs to programmatically access and update a .aspx (ASP.NET) page. Specifically, I need to automatically access this page, use several html and JavaScript elements (click checkboxes, enter text in form fields, "click" buttons), and reload the page. Also, during the time the page is accessed, there is information being sent back and forth between the client and server.
What is the most efficient way to go about this? I am most likely thinking about writing something in bash + python to do this but I am not sure it is the best tool for the job.
Thanks
The optimal solution for your problem is using Selenium with python.
The selenium package is used to automate web browser interaction from Python.
pip install -U selenium
You can read the documentation to get familiar with the Selenium Webdriver API.
You cannot edit the pages that are hosted by others, but you can mimic the requests using selenium.

How can I make a simple firefox extension which can call a python script?

There is a website that I frequently go to and play a browser game. I want to be able to have some kind of firefox plugin that can scrape data off of the page and send it to a python script. I want the controls for the program (toggle on/off) to be a HTML display which is added on to the webpage after every time I load it.
I am aware of plugins like Greasemonkey, but I don't want to use this because if I want to send any data to python, I have to setup a python http server and manually launch it every time I want to use my program.
Essentially this is what I want to be able to do:
Open Firefox as I would normally to do any kind of internet browsing
Go to the website which has my game.
The game is loaded, javascript code is executed which adds some basic HTML controls which can be used to toggle settings in my backend python program
If I choose to enable the program, javascript will parse the page when necessary and send that data to a python script on my machine.
The python program executes, recieves the data, and does what I want.
I feel like this should be a simple task, but I can't find anything straightforward. From what I have been reading, I can make a Firefox extension which can do this, but the tutorials I have seen are all for things like adding extra features to the browser. I just want a minimal tutorial since all I need to do is just run my own javascript when visiting website "X" and then call a python script.

Categories