I am trying to send a few POST/GET requests then transfer the session/cookies to a Chrome webdriver with Selenium. As I am new to Selenium, I'm unsure of how to accomplish this. I have seen how people do Selenium session to requests, but that's not what I need. Any help would be appreciated as I have looked everywhere and can't see to find a solution.
import requests
from selenium import webdriver
driver = webdriver.Chrome()
s = requests.Session()
s.post('http://example.com')
s.get('http://example.com')
# need to trasnfer session/cookies to driver
driver.get('http://example.com')
Related
that's the first time I work with Selenium and I got the problem that I can't open a Website in Edge using Selenium.
from selenium import webdriver
driver = webdriver.Edge(executable_path="/Program Files (x86)/Microsoft/Edge/Application/msedge.exe")
url = "https://www.google.com/"
driver.get(url)
Please help me as fastest as possible. Thank you.
I'm trying to navigate to the following page and extract the html https://www.automobile.it/annunci?b=data&d=DESC, but everytime i call the get() method it looks like the website redirects me to the another page, always the same one which is https://www.automobile.it/torrenova?radius=100&b=data&d=DESC.
here's the simple code i'm running:
from selenium import webdriver
driver = webdriver.Chrome(executable_path=ex_path)
driver.get("https://www.automobile.it/annunci?b=data&d=DESC")
html=driver.page_source
if i do the same thing using the request module i don't get redirected
import requests
html=requests.get("https://www.automobile.it/annunci?b=data&d=DESC")
i don't understand why it's behaving like this, any ideas?
Use driver.delete_all_cookies()
from selenium import webdriver
driver = webdriver.Chrome(executable_path=ex_path)
driver.delete_all_cookies()
driver.get("https://www.automobile.it/annunci?b=data&d=DESC")
html=driver.page_source
PS: be also warned: Page_source will not get you the completed DOM as rendered.
Well you can clear browser cache by using the below code :
I am assuming that you are using chrome.
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome(executable_path=ex_path)
driver.get('chrome://settings/clearBrowserData')
driver.find_element_by_xpath('//settings-ui').send_keys(Keys.ENTER)
driver.get("https://www.automobile.it/annunci?b=data&d=DESC")
Im trying to make a bot that visits my adfly link using the chrome webdriver. Every time I try to use the code below though, the webdriver tells me that there were too many redirects and doesn't follow through. The code below is just being used for testing at the moment:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("--proxy-server="+"http://102.129.249.120:8080")
browser = webdriver.Chrome(options=options)
browser.get("http://raboninco.com/18Whc")
Image of error here
Okay so i figured it out. I can use tor with selenium to get access to adfly. Works great btw. Thanks for the help and time guys. If you want to see the code I used, here it is:
from selenium import webdriver
import os
os.popen(r'C:\Users\joldb\Desktop\Tor Browser\Browser\TorBrowser\Tor\tor.exe')
options = webdriver.ChromeOptions()
options.add_argument("--proxy-server="+"socks5://127.0.0.1:9050")
browser = webdriver.Chrome(options=options)
browser.get("http://raboninco.com/18Whc")
I am making web-crawler to get information from http://www.caam.org.cn/hyzc, but it showed me HTTP Error 302, and I cannot fix it.
https://imgur.com/a/W0cykim
The picture gives you a rough idea about the special layout of this website in that when you are browsing it, it will pop out a window, telling you that the website is accelerating, for the reason that there are so many people online, and then direct you to that website. As a result, when I use web-crawler, all I get is the information on this window, but nothing on this website. I think this is a good way for the website keeper to get rid of our web crawlers. So I want to ask for your help to get useful information from this website
At first, I used requests of python for my web crawler, and I only got information on that window, the results are shown here: https://imgur.com/a/GLcpdZn
And then I forbad website redirect, I got HTTP Error 303, shown:
https://imgur.com/a/6YtaVOt
This is the latest code I used:
python
import requests
def getpage(url):
try:
r= requests.get(url, headers={'User-Agent':'Mozilla/5.0'}, timeout=10)
r.raise_for_status()
r.encoding = r.apparent_encoding
return r.text
except:
return "try again"
url = "http://www.caam.org.cn/hyzc"
print(getpage(url))
The expected outcome of this question is to get useful information from the website http://www.caam.org.cn/hyzc. We may need to deal with the window popped out.
Looks like this website have some kind of protection against crawlers using requests, the page is not entirely loaded when you send a get request.
You can try to emulate a browser using selenium:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('http://www.caam.org.cn/hyzc')
print(driver.page_source)
driver.close()
driver.page_source will contain the page source.
You can learn how to setup selenium webdriver here.
I added something to delay the closure of my web crawl and this worked. So I want to share my lines in case you meet similar problem in the future:
python
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = Options()
driver = webdriver.Chrome(chrome_options=options)
driver.get('http://www.caam.org.cn')
body = driver.find_element_by_tag_name("body")
wait = WebDriverWait(driver, 5, poll_frequency=0.05)
wait.until(EC.staleness_of(body))
print(driver.page_source)
driver.close()
Would it be possible to automate the process of logging into a site within the firefox browser, copy and store the cookie, place it in a DB table?
Scratching my head with this one.
A simple way to achieve this is through browser automation.
from selenium import webdriver
driver = webdriver.Firefox(executable_path='{/path/to/geckodriver}')
driver.get("http://google.com")
cookies = driver.get_cookies()
Note: Download geckodriver compatiable to your firefox.
Edit: Completly overlooked the fact that python's requests module has a cookies attribute. This would be quicker compared to browser automation.
import requests
resp = requests.get("http://google.com")
cookies = resp.cookies._cookies