I am working on a scraper built in RSelenium. A number of tasks are more easily accomplished using Python, so I've set up a .Rmd file with access to R and Python code chunks.
The R-side of the scraper opens a website in Chrome, logs in, and accesses and scrapes various pages behind the login wall. (This is being done with permission of the website owners, who would rather users scrape the data ourselves than put together a downloadable.)
I also need to download files from these pages, a task which I keep trying in RSelenium but repeatedly come back to Python solutions.
I don't want to take the time to rewrite the code in Python, as it's fairly robust, but my attempts to use Python result in opening a new driver, which starts a new session no longer logged in. Is there a way to have Python code chunks access an existing driver / session being driven by RSelenium?
(I will open a separate question with my RSelenium download issues if this solution doesn't pan out.)
As far as I can tell, and with help from user Jortega, Selenium does not support interaction with already open browsers, and Python cannot access an existing session created via R.
My solution has been to rewrite the scraper using Python.
Related
I am quite new to electron and I couldn't find anything useful online. Hence, the below doubt:
Task I am trying to achieve:
User provides list of links where he/she wants to get data from.
Once the list is provided and button is clicked, I am trying to go through all the websites, scrape the content from them and provide it to user in an organized format.
To do this, I am using python as a backend wherein, I am passing the list of links to python script and scraping data from each link by first launching it via selenium.
In this process, there are two issues which I want to find solutions for:
Each time the link is launched via selenium, chrome/firefox browser loads the link which is visible to user.
Also, the link loads in an external browser instead of opening in the software itself.
Can anyone share the appropriate flow by which the link will be launched via chromium browser in electron.
Thanks
Is it possible to upload and manipulate a photo in the browser with GitHub-pages? The photo doesn't need to be stored else than just for that session.
PS. I'm new to this area and I am using python to manipulate the photo.
GitHub pages allows users to create static HTML sites. This means you have no control over the server which hosts the HTML files - it is essentially a file server.
Even if you did have full control over the server (e.g. if you hosted your own website), it would not be possible to allow the client to run Python code in the browser since the browser only interprets JavaScript.
Therefore the most easy solution is to re-write your code in JavaScript.
Failing this, you could offer a download link to your Python script, and have users trust you enough to run it on their computer.
I work in tech support and currently have to manually keep our manuals for products updated manually by periodically checking to see if it has an update and if it does replacing the current one saved on our network.
I was wondering if it would be possible to build a small program to quickly download all files on a suppliers website and have them automatically download and be sorted into the given folders for those products, replacing the current PDF's in that file. I must also note that the website is password protected and is sorted into folders.
Is this possible with Python? I figured a small program I could perhaps run once a week or something to automatically update our manuals would be super useful (and a learning experience).
Apologies if I haven't explained the requirement well, any questions let me know.
It's certainly possible. As the other answer suggests you will want to use libaries like Requests (Handle HTTP requests) or Selenium (AUtomated browser activity) to navigate through the login.
You'll need to sort through the links on a given page, could be done with beautifulsoup ideally (An HTML parser) but could be done with selenium (Automated Browser activity).You'll need to check out libraries like requests (To handle HTTP requests) for downloading the pdf's, the OS module for sorting the folders out into specific folders and replacing files.
I strongly urge you to think through the steps, But I hope that gives an idea about the libaries that you'll need to learn abit about. The most challenging thing to learn will be using selenium, so if you can use requests to do the login that is much better.
If you've got a basic grasp of python the requests, OS module and beautifulsoup libraries are not difficult things to pick up.
You can use selenium for browser automation. This could insert the password (although the are you a robot stuff might stop you), and then you can download the pdf's simply by setting a default download location and clicking the download button. This will make the browser download the files to the default download location.
I want to webscraping from a website, where i have to log in first. The problem is that, there is a "robotprotection" too (so I have to verify that i am not a robot + a recaptcha-security.), but it's chances of success (passing the captcha) is ~30% and this is horrible for me.
There is another possibility maybe which one i am log in with my browser (for example chrome or firefox), and after im going to use this session ID in my python script to webscraping dataes automatically?
So, more simplier: I want to webscraping tables from a website, so i have to log in first. This 30% succes rate is not enough good for me, so i hope there is another possibilty : log in manually, and after use this session in python?!
After that, there is a textbox in this page, where i want to write what i want to search, and after it is navigate to the page, where i'll found the table and dataes.
Any ideas, or it is possible?
(now i have only a script which one i have to download the html code to this datapage, and after change some name in the code manually..it is a very big waste time, i hope i can automate it more.) - Python 2.7
There is a website that I frequently go to and play a browser game. I want to be able to have some kind of firefox plugin that can scrape data off of the page and send it to a python script. I want the controls for the program (toggle on/off) to be a HTML display which is added on to the webpage after every time I load it.
I am aware of plugins like Greasemonkey, but I don't want to use this because if I want to send any data to python, I have to setup a python http server and manually launch it every time I want to use my program.
Essentially this is what I want to be able to do:
Open Firefox as I would normally to do any kind of internet browsing
Go to the website which has my game.
The game is loaded, javascript code is executed which adds some basic HTML controls which can be used to toggle settings in my backend python program
If I choose to enable the program, javascript will parse the page when necessary and send that data to a python script on my machine.
The python program executes, recieves the data, and does what I want.
I feel like this should be a simple task, but I can't find anything straightforward. From what I have been reading, I can make a Firefox extension which can do this, but the tutorials I have seen are all for things like adding extra features to the browser. I just want a minimal tutorial since all I need to do is just run my own javascript when visiting website "X" and then call a python script.