Python - requests_html screen scraping

Python - requests_html screen scraping - python

I’m trying to log in to a pretty complex (to my beginner’s eye) website and make a reservation. Did not know a single python statement before starting the project. After many starts and stops have successfully logged in using requests_html/HTMLSession. Have overcome the security/authorization issues and arrived at target page. The page displays the server time on it and I cannot hit the proper key until the time reaches 7:00 AM.
I am unable to access the field. I have tried the .search and .find commands, but nothing. I am hoping someone can tell me how to download the time into my program so I can test the time and wait until it reaches, or almost reaches 7:00. (I say almost because the reservation is for tee times and there is a real crunch at 7 – the whole point of this application is to automate the process and be the fastest!)
So I need to be able to load the time into my python and click a date file when the clock reaches 7:00.

No idea what scraping tool you are using, but generaly you would access this elemen via xpath or css selector:
response.css(".jquery_server_clock::text").extract()
This example is if you are using scrapy

Maybe you would be better off using selenium.
Selenium allows you to automate a browser window, so it's possible that it is not possible to interact with the site using requests, but using selenium the site you visit thinks you are using a normal browser but you can automate everything.
So what I would do if I were you:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("your_url.com")
input("Navigate to the desired page, then press enter")
while not driver.find_element_by_class_name("jquery_server_clock").text[0] == "7":
pass
driver.find_element_by_class_name("other_button").click()
This would wait until it is 7 AM and then click the other button immediately.

Related

Show name of the current open website in any browser using python code

I have been working on a python code which shows the time spent on an applicaton based on keyboard and mouse movement.
I can set the start and stop time, the output of this code is the duration I spend on each application in the set duration.
I have also added "idle time" which happens when I am not working on the system.
My question is,
In case I am using a browser app like chrome or firefox or others, I need the name of the website.
This will help me track how much time I am spenting on those websites

You should 100% use Selenium. It's a browser automation tool that uses python to automate web browsers. I personally use it to automate YouTube uploads. You can do what your trying to do with a few commands, selenium is one of the best tools for this usecase!
Here's the github! - https://github.com/SeleniumHQ/selenium

Selenium Grid + Python: clicking on a WebElement results very slow

I have built a bot that plays an online roulette with Selenium (Selenium Grid) and Python.
When it comes to clicking on the number I want to bet on, it is extremely slow and does not manage to complete its stake (within the given time range for the bet) across all numbers that make my bet complete.
It seems like slowness may be given from the animation the button does after I click on it.
The code is very simple:
element = WebDriverWait(driver, timeout).until(EC.presence_of_element_located((By.XPATH, path)) # I manage to retrieve the WebElement, this is fast, no problem here
element.click() # this is slow
Here you can find:
how it looks now > https://drive.google.com/file/d/1dEuWTtrXHzRfXXVHhUbdNR8XtgMeWdU-/view?usp=sharing
my target > https://drive.google.com/file/d/1NUbr6rpOGjdMuClD5hby91jPVumqwLC5/view?usp=sharing (here I use the pynput library which is not my target cause I want the script to run on the server using Selenium Grid).
Anyone can help?

I'm not actually sure, is it the same problem or not. In my case, after clicking submit button on login form and redirecting to home page, my script doesn't do anything for around 4 minutes.
I've noticed, that WebElement.click() function ends execution only after page stops loading, but some trackers on site prevent page from complete loading, so I added uBlock extension and got rid of my problem.

How do I hide the fact I'm using a bot?

So for my python selenium script I have to complete a lot of Captcha's. I noticed that when I get the Captcha's on my regular browser they're much easier and quicker. Is there a way for me to hide the fact that I'm using a web automation bot so I get the easier Captcha's?
I already tried randomizing the User Agent but to no success.

You can go to your website and inspect the page. Then go to the network tab and select Network. Reload the page and the select the webpage you are accessing from the list. If you scroll down, you can see the user agent that your browser is using to access the page. Use that user agent in your scraper to exactly mimick your browser.

From a generic perspective there are no proven ways to hide the fact that you are using a Selenium driven web automation bot.
You can find a relevant detailed discussion in Can a website detect when you are using Selenium with chromedriver?
However at certain times modifying the navigator.webdriver flag helps to prevent detection.
References
You can find a couple of relevant detailed discussions in:
Is there a way to use Selenium WebDriver without informing the document that it is controlled by WebDriver?
Selenium Chrome gets detected
How does recaptcha 3 know I'm using selenium/chromedriver?

Python screenshot especific tab each time it loads

The problem: I want to write a Python script that takes a screenshot of a website I have opened in a browser each time it loads.
The thing is that I have a website where there are like 300 exam questions which I can get through, try each one of them and I will have the correction when I submit my answer. I will not have access to this questionnaire after a certain date, but I want to keep the questions (which I could write down, but laziness is strong in me, and want to learn Python).
The "attempt": I thought of doing a simple Python script with imgkit to take the screenshots. I'm opened to other suggestions, as imgkit was the first thing I saw while looking for this, and the code looks plain and simple to me:
import imgkit
imgkit.from_url('http://webpage.com', 'out.jpg')
But I have to provide the url for each webpage, and that will be more tedious than taking a screenshot with OS features, thus I want to automatize it.
The questions:
There is a way to make Python monitor a browser tab and take a screenshot each time it reloads (that will be when a new question appears)?
Or maybe get the tab's URL to pass it to imgkit and take the screenshot.
Another thing that I saw is that imgkit can generate a "screenshot" from a HTML file. Can Python download the HTML code from a tab I have open in my browser?

Selenium is your friend here. It is a framework designed for testing but it will make what you want really easy.
Selenium allows you to spin-up a web browser and control it. So you can instruct it to go to the web address you want and then do things. Normally you would instruct it to click here, write in a form, etc.
In your case you only want it to open a certain address, take a screenshot, go the the next address and repeat.
Here you have a tutorial on how to do exactly what you want.
The specific code is:
from selenium import webdriver
#1. Get the driver to manage the web-browser you choose
driver = webdriver.Chrome()
#2. Go the the webadress you want
driver.get('https://python.org')
#3. Take a screenshot
driver.save_screenshot("screenshot.png")
driver.close()
PS: In order for the tutorial to run you will need to have installed the web driver for Selenium to be able to spin-up and run Chrome. Here are the instructions for that.

Selenium Webdriver for Python: get page, enter values, click submit, get source

Alright, I'm confused. So I want to scrape a page using Selenium Webdriver and Python. I've recorded a test case in the Selenium IDE. It has stuff like
Command Taget
click link=14
But I don't see how to run that in Python. The desirable end result is that I have the source of the final page.
Is there a run_test_case command? Or do I have to write individual command lines? I'm rather missing the link between the test case and the actual automation. Every site tells me how to load the initial page and how to get stuff from that page, but how do I enter values and click on stuff and get the source?
I've seen:
submitButton=driver.find_element_by_xpath("....")
submitButton.click()
Ok. And enter values? And get the source once I've submitted a page? I'm sorry that this is so general, but I really have looked around and haven't found a good tutorial that actually shows me how to do what I thought was the whole point of Selenium Webdriver.

I've never used the IDE. I just write my tests or site automation by hand.
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("http://www.google.com")
print browser.page_source
You could put that in a script and just do python wd_script.py or you could open up a Python shell and type it in by hand, watch the browser open up, watch it get driven by each line. For this to work you will obviously need Firefox installed as well. Not all versions of Firefox work with all versions of Selenium. The current latest versions of each (Firefox 19, Selenium 2.31) do though.
An example showing logging into a form might look like this:
username_field = browser.find_element_by_css_selector("input[type=text]")
username_field.send_keys("my_username")
password_field = browser.find_element_by_css_selector("input[type=password]")
password_field.send_keys("sekretz")
browser.find_element_by_css_selector("input[type=submit]").click()
print browser.page_source
This kind of stuff is much easier to write if you know css well. Weird errors can be caused by trying to find elements that are being generated in JavaScript. You might be looking for them before they exist for instance. It's easy enough to tell if this is the case by putting in a time.sleep for a little while and seeing if that fixes the problem. More elegantly you can abstract some kind of general wait for element function.
If you want to run Webdriver sessions as part of a suite of integration tests then I would suggest using Python's unittest to create them. You drive the browser to the site under test, and make assertions that the actions you are taking leave the page in a state you expect. I can share some examples of how that might work as well if you are interested.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - requests_html screen scraping - python

No idea what scraping tool you are using, but generaly you would access this elemen via xpath or css selector: response.css(".jquery_server_clock::text").extract() This example is if you are using scrapy

Related

Show name of the current open website in any browser using python code

Selenium Grid + Python: clicking on a WebElement results very slow

How do I hide the fact I'm using a bot?

Python screenshot especific tab each time it loads

Selenium Webdriver for Python: get page, enter values, click submit, get source

Categories

Resources