How to simulate a AJAX call (XHR) with python and mechanize - python

I am working on a project that does online homework automatically.
I am able to login, finding exercises and even filling the form using mechanize.
I discovered that the submit button trigger a javascript function and I searched for the solution. A lot of answers consist of 'simulating the XHR'. But none of them talked about the details.
I don't know if this screen cap helps.
http://i.stack.imgur.com/0g83g.png
Thanks

If you want to evaluate javascript, I'd recommend using Selenium. It will open a browser which you can then send text to it from python.
First, install Selenium: https://pypi.python.org/pypi/selenium
Then download the chrome driver from here: https://code.google.com/p/chromedriver/downloads/list
Put the binary in the same folder as the python script you're writing. (Or add it to the path or whatever, more information here: https://code.google.com/p/selenium/wiki/ChromeDriver)
Afterwards the following example should work:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome()
driver.get("http://www.python.org")
assert "Python" in driver.title
elem = driver.find_element_by_name("q")
elem.send_keys("selenium")
elem.send_keys(Keys.RETURN)
assert "Google" in driver.title
driver.close()
More information here
(The example was also from there)

A xhr is the same as a regular request. Make it the same way and then deal with the response.

Related

python selenium Unable to locate element for Google map term and service

I am trying to automate the search for the short google link via this code:
link = 'https://www.google.com/maps/place/Sport+La+Pava/#41.273359299999996,2.0005245,14z/data=!4m8!1m2!2m1!1sSport+La+Pava!3m4!1s0x12a49d5b3d4b1753:0xeb7e41655fa9ec91!8m2!3d41.273359299999996!4d2.0005245'
from selenium import webdriver
CHROME_DRIVER_PATH = "D:\chromedriver\chromedriver.exe"
driver = webdriver.Chrome(executable_path=CHROME_DRIVER_PATH)
driver.get(link)
time.sleep(3)
button1 = driver.find_element_by_id("introAgreeButton")
button1.click()
new_https = driver.find_element_by_xpath('/html/body/jsl/div[3]/div[2]/div/div[2]/div/div[3]/div/div/div[1]/div[4]/div[2]/div[1]/input').value_of_css_property()
print(new_https)
the link is a google map link.
The error happen at button1 = driver.find_element_by_id("introAgreeButton"). The button I am trying to get through here is basically the term and condition. I have to accept it. but everytime I receive the error NoSuchElementException
I have tried different method: using Xpath, full Xpath, css, nothing work.
I use the same code for website like amazon.com all work fine there, so it is not about the location of my webdriver or anything like that. it seems quite Google term and condition specific
As #Nick pointed out, this question has been answered (code was in javascript) Here is the code in Python for thoe who need it:
driver.switch_to.frame(0)
driver.find_element_by_id("introAgreeButton").click()

Using webbrowser instead of Selenium

I am trying to make an auto yt searcher and this is as far as I got. I don't really understand on how to user selenium so I am using web-browser what's currently not happening is it is not opening or searching on the search bar (I used the XPath)or does anyone know a tutorial for selenium
import webbrowser
import time
url = 'http://youtube.com'
chrome_path = 'C:/Program Files (x86)/Google/Chrome/Application/chrome.exe %s'
webbrowser.get(chrome_path).open(url)
LastName = "nice"
time.sleep(2)
last = webbrowser.open('//*[#id="search"]')
webbrowser.get(LastName)
You can't use the actual Google Chrome Browser.
Instead you have to use ChromeDriver that works the same as the standard Chrome but you can use it with selenium.
You can download it here.
Once installed specify the correct path of chromedriver.
Python webbrowser module is not intended for interaction with UI objects. The purpose is just to open browser in order to display arbitrary web document from your python code.
If you check this module page you will know that webbrowser.open(...) takes the URL as a parameter. It cannot work with xpath or other types of selectors. It cannot send other types of commands to browser except of "open that page".
So you have to deal with webdriver.
I would highly recomend to use selenium, because it is not very hard to learn and to integrate.
from selenium import webdriver
api = webdriver.Firefox()
api.get("https://www.youtube.com")
elem = api.find_elements_by_id("search")[1]
elem.send_keys("Your Search")
elem.submit()
Here you only have to change search and it works. Just download firefox and the geckodriver put the geckodriver in the same directory as the script and tadah it should work.

Why is a web page opened through selenium different from a normal browser?

I'm learning how selenium crawls data, but I find that when a website opens through selenium, it's different from what I used to get when I used other normal browsers. Even I add headers. And I'm very confused.
I really want to upload two contrast pictures, but I can't upload them in stackoverflow at present. I even tried to open the chrome driver and enter the web address manually, but the result is still different.
I use Python 3.6, selenium and chrome 75.0.3770.80
from selenium import webdriver
driver = webdriver.Chrome() #创建driver实例
url = 'https://www.free-ss.ooo'
driver.get(url)
At present, I can't post pictures on stack overflow, but I just want to figure out how I can use selenium to get normal web pages.
Aha,I found the problem, really because the target site detected selenium, the solution is to add options
Chrome_options. add_experiment_option ('excludeSwitches', ['enable-automation'])
Faced same issue and was able to resolve it by removing or fixing appropriate user-agent argument and it worked fine in both headless and non-headless mode.
The resolution was inspired by PDHide post

Python: Any way to webscrape and detect changes in a single-page application?

So I'm trying to webscrape and check for a specific change in a website, and the website has a search bar in which I need to enter something into to get into the specific page in which I want to webscrape. The problem is, the website is a single-page application where the URL does not change after the page is refreshed with new results. I have tried using requests but it is not use as it is dependant on the URL...
Is there a method in requests, or a python library, that can bypass this issue and allow me to move forward with my idea?
My suggestion is, try opening the page with developer console. Check what kind of requests the SPA is sending (the XHR requests is what interests you) when you enter your data. The url address the payload format etc. Then mimic the webpage. Create a session object with requests, get the page (probably this is not mandatory but it won't hurt so why not) and then send the payload to the right address, you will receive your data. Probably it won't be HTML and more some kind of JSON data but that is even better because it is easier to work with later on. If you do need to have the HTML version there are bindings in python to libraries such as PhantomJS. You can use them to render the page and then check for existence of specific elements. Also you can use selenium it is library that allows you to control your browser. You can even watch it work. It uses your existing browser so it can handle any kind of webpage SPA or other. It all depends on your needs. If your after the pure data I would go with my first solution if you would like to mimic a user then selenium is by far the simplest.
Below example usage of selenium, exert from their website.
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait # available since 2.4.0
from selenium.webdriver.support import expected_conditions as EC # available since 2.26.0
# Create a new instance of the Firefox driver
driver = webdriver.Firefox()
# go to the google home page
driver.get("http://www.google.com")
# the page is ajaxy so the title is originally this:
print driver.title
# find the element that's name attribute is q (the google search box)
inputElement = driver.find_element_by_name("q")
# type in the search
inputElement.send_keys("cheese!")
# submit the form (although google automatically searches now without submitting)
inputElement.submit()
try:
# we have to wait for the page to refresh, the last thing that seems to be updated is the title
WebDriverWait(driver, 10).until(EC.title_contains("cheese!"))
# You should see "cheese! - Google Search"
print driver.title
finally:
driver.quit()

Selenium - 'site' object has no attribute 'find_element_by_link_text'

I'm trying to write a python script that clicks a certain link in a table on a webpage. The only option I have to select this particular link is it's link text, but selenium keeps telling me that the command "find_element_by_link_text" doesn't exist even though it's found on not only the official selenium docs but also multiple online selenium examples. Here's the code snippet:
hac.find_element_by_link_text("View this year's Report Cards").click()
I cross-checked my selenium installation with one from the website and they seem to be the same. Was this feature deprecated or am I just missing something? I'm using selenium v.2.45.0 and python v.2.7.
You need to call the find_element_by_link_text() method using driver.
Here is a sample script that opens the Python home page, locates the link to the About page using its link text, and then clicks that link:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("http://www.python.org")
driver.implicitly_wait(10)
elem = driver.find_element_by_link_text("About")
driver.implicitly_wait(10)
elem.click()
This page of the Selenium docs gives an overview of all of the find_element methods available, and shows how to call those methods.
If you are using python selenium 4.3 some methods are deprecated like
find_element_*() and find_elements_*()
Examples:
find_element("name","element_name")
find_element("xpath","xpath_here")

Categories