selenium can not Submit an answer in zhihu.com - python

url:https://www.zhihu.com/question/305744720/answer/557418746
use selenium can not reply answer,only human
options.add_experimental_option("debuggerAddress", "127.0.0.1:9222")
browser = webdriver.Chrome(executable_path='.\chromedriver.exe',chrome_options=options)
button_li = self.browser.find_elements_by_class_name('Button--blue')
if len(button_li) > 2:
print(len(button_li))
button_ele = button_li[4]
button_ele.click()
time.sleep(random.uniform(0.5, 3))
browser.find_element_by_css_selector('div.AnswerForm-editor').click()
time.sleep(random.uniform(0.5, 2))
js="""
var div=document.getElementsByClassName('public-DraftStyleDefault-block')
var text =document.createTextNode("君");
div[0].firstChild.appendChild(text)
"""
self.browser.execute_script(js)
browser.find_element_by_css_selector('Button.Button.AnswerForm-submit').click()
Summary of problem:
My problem is that I wrote the content to the answer box successfully, but I was identified as a machine. After that, my actions on the page seemed to stop working. How can I avoid being identified as a machine so that I can still use selenium to select my element?

I'm not sure where you are trying to select the submit button, but the following selector worked for me:
browser.find_element_by_css_selector('Button.Button.AnswerForm-submit').click()
With respect to being detected as a 'machine', it's not easy to avoid that.
There are several different ways they can detect you.
That said, here's one thing I found that can avoid some of the Selenium detection attempts. One thing they look for document variables called $cdc_ and $wdc_ that selenium uses, and for Chrome it would be $cdc_. Here is what I suggest you try:
Download a hex editor if you don't already have one. I used one from here.
open your chromedriver.exe in the hex editor.
Use the Search functionality to find any instance of $cdc_ or $wdc_ and replace basically any other string ending in an underscore. Myself, I found just one instance of $cdc_, and it looked like this:
'$cdc_asdjflasutopfhvcZLmcfl_'
I simply replaced it with 'Random_'
Hopefully this works and now you can traverse the site unimpeded. If not, try some of the following; the only problem with this is that it might break your chromedriver file so that the tests no longer work. But it could be worth a try, and if it does break you can easily download a fresh one.
Search the document for any usage of the words 'selenium', 'webdriver', or 'chromedriver' and delete them. These is another way that a site can tell you are using selenium.
Let me know if any of this helps or you have any questions. It's hard for me to come up with a concrete answer because I don't know how exactly the site is detecting selenium.

Related

Can I set disallow image in python selenium by using "chrome://settings/content/images"

I'm using selenium for make some programs and I found that I don't need some images in specific pages. But, In some pages I need to load image...
Of course, I found the way that using
{"profile.managed_default_content_settings.images": 2}
But, This sets browser not to load image in "All" pages...
So I was looking for disabling images in specific page, and I found there was a way to disallow in pages that i want to.
It is in (chrome://settings/content/images) and add some url (I'm korean so, I don't know how does it look like to you...)
So I was searching how to set chrome settings but there was no information that i need...
Maybe, it can be used by chrome_options in selenium.. but I can't find it on python selenium docs...
And I know that there is a way like this
driver.get("chrome://settings/content/images").find(cr-button id='addSite').click() ~~~~~~
But I want something clean so... I just want to set it when main code starts... so...
Anybody Can help me?

Enter isolation mode for selenium element using cdp

Currently, I have a code that uses remote selenium server to look for ads and save their images, or take screenshots in case their image isn't easily available. As of now I use:
iframes = list(driver.find_elements_by_tag_name('iframe'))
to obtain each iframe and then loop through them doing iframe.screenshot(filename). This works fine for the most part, but some pages take bad screenshots (they screenshot other aprts of the page or sometimes they have an element blocking the ad).
So what I'm trying to do is use the newly implemented chrome dev tools protocol on selenium to isolate each element, so it's the only visible element on the screen. This should be possible since I'm able enter chrome's dev tools, right click my desired element and "enter isolation mode". I just don't know where to begin with the code because the documentation is pretty messy and no one seems to have posted about this previously.
I've found these methods: https://chromedevtools.github.io/devtools-protocol/tot/Overlay/#method-setShowIsolatedElements
https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-createIsolatedWorld
but I'm not sure how to even test them.
If anyone's got any experience working wiht this or can think of a better solution, I'd greatly apreciate your help.

Selenium and Python 3 – unable to find element

First off, apologies for a commonly asked question. I've looked through all the earlier examples but none of the answers seem to work in my situation.
I'm trying to locate the username and password fields from this website: http://epaper.bt.com.bn/
I've had no problems locating the "myprofile" element and clicking on it. It then loads a page into an iframe. Here's my problem. I've tried all the various methods like find_element_by_id('input_username'), find_element_by_name('username') etc and they all do not work. Would appreciate if someone could point me down the right path.
Try first: (you should switch to iframe)
driver.switch_to.frame("iframe_login")
then you can find your elements. For example:
driver.find_element_by_id("input_username").send_keys("username")
for moving out of iframe:
driver.switch_to.default_content()

Selenium Webdriver for Python: get page, enter values, click submit, get source

Alright, I'm confused. So I want to scrape a page using Selenium Webdriver and Python. I've recorded a test case in the Selenium IDE. It has stuff like
Command Taget
click link=14
But I don't see how to run that in Python. The desirable end result is that I have the source of the final page.
Is there a run_test_case command? Or do I have to write individual command lines? I'm rather missing the link between the test case and the actual automation. Every site tells me how to load the initial page and how to get stuff from that page, but how do I enter values and click on stuff and get the source?
I've seen:
submitButton=driver.find_element_by_xpath("....")
submitButton.click()
Ok. And enter values? And get the source once I've submitted a page? I'm sorry that this is so general, but I really have looked around and haven't found a good tutorial that actually shows me how to do what I thought was the whole point of Selenium Webdriver.
I've never used the IDE. I just write my tests or site automation by hand.
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("http://www.google.com")
print browser.page_source
You could put that in a script and just do python wd_script.py or you could open up a Python shell and type it in by hand, watch the browser open up, watch it get driven by each line. For this to work you will obviously need Firefox installed as well. Not all versions of Firefox work with all versions of Selenium. The current latest versions of each (Firefox 19, Selenium 2.31) do though.
An example showing logging into a form might look like this:
username_field = browser.find_element_by_css_selector("input[type=text]")
username_field.send_keys("my_username")
password_field = browser.find_element_by_css_selector("input[type=password]")
password_field.send_keys("sekretz")
browser.find_element_by_css_selector("input[type=submit]").click()
print browser.page_source
This kind of stuff is much easier to write if you know css well. Weird errors can be caused by trying to find elements that are being generated in JavaScript. You might be looking for them before they exist for instance. It's easy enough to tell if this is the case by putting in a time.sleep for a little while and seeing if that fixes the problem. More elegantly you can abstract some kind of general wait for element function.
If you want to run Webdriver sessions as part of a suite of integration tests then I would suggest using Python's unittest to create them. You drive the browser to the site under test, and make assertions that the actions you are taking leave the page in a state you expect. I can share some examples of how that might work as well if you are interested.

Selenium with Python, how do I get the page output after running a script?

I'm not sure how to find this information, I have found a few tutorials so far about using Python with selenium but none have so much as touched on this.. I am able to run some basic test scripts through python that automate selenium but it just shows the browser window for a few seconds and then closes it.. I need to get the browser output into a string / variable (ideally) or at least save it to a file so that python can do other things on it (parse it, etc).. I would appreciate if anyone can point me towards resources on how to do this. Thanks
using Selenium Webdriver and Python, you would simply access the .page_source property to get the source of the current page.
for example, using Firefox() driver:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get('http://www.example.com/')
print(driver.page_source)
driver.quit()
There's a Selenium.getHtmlSource() method in Java, most likely it is also available in Python. It returns the source of the current page as string, so you can do whatever you want with it
Ok, so here is how I ended up doing this, for anyone who needs this in the future..
You have to use firefox for this to work.
1) create a new firefox profile (not necessary but ideal so as to separate this from normal firefox usage), there is plenty of info on how to do this on google, it depends on your OS how you do this
2) get the firefox plugin: https://addons.mozilla.org/en-US/firefox/addon/2704/ (this automatically saves all pages for a given domain name), you need to configure this to save whichever domains you intend on auto-saving.
3) then just start the selenium server to use the profile you created (below is an example for linux)
cd /root/Downloads/selenium-remote-control-1.0.3/selenium-server-1.0.3
java -jar selenium-server.jar -firefoxProfileTemplate /path_to_your_firefox_profile/
Thats it, it will now save all the pages for a given domain name whenever selenium visits them, selenium does create a bunch of garbage pages too so you could just delete these via a simple regex parsing and its up to you, from there how to manipulate the saved pages

Categories