I am using Requests module to send GET and POST requests to websites and then processing their responses. If the Response.text meets a certain criteria, I want it to be opened up in a browser. To do so currently I am using selenium package and resending the request to the webpage via the selenium webdriver. However, I feel it's inefficient as I have already obtained the response once, so is there a way to render this obtained Response object directly into the browser opened via selenium ?
EDIT
A hacky way that I could think of is to write the response.text to a temporary file and open that in the browser. Please let me know if there is a better way to do it than this ?
To directly render some HTML with Selenium, you can use the data scheme with the get method:
from selenium import webdriver
import requests
content = requests.get("http://stackoverflow.com/").content
driver = webdriver.Chrome()
driver.get("data:text/html;charset=utf-8," + content)
Or you could write the page with a piece of script:
from selenium import webdriver
import requests
content = requests.get("http://stackoverflow.com/").content
driver = webdriver.Chrome()
driver.execute_script("""
document.location = 'about:blank';
document.open();
document.write(arguments[0]);
document.close();
""", content)
Related
Is it possible to send a get request to a webdriver using selenium?
I want to scrape a website with an infinite page and want to scrape a substantial amount of the objects on the website. For this I use Selenium to open the website in a webdriver and scroll down the page until enough objects on the page are visible.
However, I'd like to scrape the information on the page with BeautifulSoup since this is the most effective way in this case. If the get request is send in the normal way (see the code) the response only holds the first objects and not the objects from the scrolled-down page (which makes sence).
But is there any way to send a get request to an open webdriver?
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
import requests
from bs4 import BeautifulSoup
# Opening the website in the webdriver
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(url)
# Loop for scrolling
scroll_start = 0
for i in range(100):
scroll_end = scroll_start + 1080
driver.execute_script(f'window.scrollTo({scroll_start}, {scroll_end})')
time.sleep(2)
scroll_start = scroll_end
# The get request
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
You should probably find out what is the endpoint that the website is using to get the data for the infinite scrolling.
Go to the website, open the Dev Tools, open the Network tab and find the HTTP request that is asking for the content you're seeking, then MAYBE you can use it too. Just know that there are a lot of variables like, are they using some sort of authorization for their APIs? Are the APIs returning JSON, XML, HTML, ...? Also, I am not sure if this is fair-use.
I want to open a url using python script and then same python script should fill the form but not submit it
For example script should open https://www.facebook.com/ and fill the name and password in the fields, but don't submit it.
You can use Selenium to get it done smoothly. Here is the sample code with Google search:
from selenium import webdriver
browser = webdriver.Firefox()
browser.get("http://www.google.com")
browser.find_element_by_id("lst-ib").send_keys("book")
# browser.find_element_by_name("btnK").click()
The last line is commented intentionally if do not want to submit the search.
Many websites don't support Web Scraping. Actually It may cost you an illegal access case on you.
But Try using requests library in python.
You'll find it easy to do that stuff.
https://realpython.com/python-requests/
payload = {'inUserName': 'USERNAME/EMAIL', 'inUserPass': 'PASSWORD'}
url = 'http://www.locationary.com/home/index2.jsp'
requests.post(url, data=payload)
I am creating a script for this site:
The first section (making the account is done):
https://my.shaadi.com/profile-creation/step/1?gtrk=1
However when configuring profiles I am having an issue, the page is loaded by JS and the token is generated using JS as well.
https://my.shaadi.com/static/js/main.4c82cc30.js
this is the JS file:
X-Access-Token: 2a719ecb4cf7a3ef45676834a596bc58|4SH80109362|
X-App-Key: 69c3f1c1ea31d60aa5516a439bb65949cf3f8a1330679fa7ff91fc9a5681b564
These are the 2 tokens I am looking to get
I can't figure out a way of getting these is it possible to use requests to do this or would it require a headless browser to run the JS (I am wanting to do it in pure python requests)
The best/easiest is use selenium or dryscrape and BeautifulSoup.
#from bs4 import BeautifulSoup
from selenium import webdriver
client = webdriver.PhantomJS()
#client.get('https://my.shaadi.com/profile-creation/step/1?gtrk=1')
client.get('https://my.shaadi.com/static/js/main.4c82cc30.js')
body = client.page_source
Now you can parse body with regexp or BeautifulSoup
I want to retrieve data from a website named as myip.ms. I'm using requests to send data to form and then I want the response page back to me. When I run the script it returns the same page (homepage) in response. I want the next page using the query I provide. I'm new in WebScraping. Here's the code I'm using to achieve this.
import requests
from urllib.parse import urlencode, quote_plus
payload={
'name':'educationmaza.com',
'value':'educationmaza.com',
}
payload=urlencode(payload)
r=requests.post("http://myip.ms/s.php",data=payload)
infile=open("E://abc.html",'wb')
infile.write(r.content)
infile.close()
I'm no expert, but it appears that when interacting with the webpage, the post is processed by jQuery, which requests does not do well with.
As such, you would have to use the Selenium module to interact with it.
The following code will execute as desired:
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get("https://myip.ms/s.php")
driver.find_element_by_id("home_txt").send_keys('educationmaza.com')
driver.find_element_by_id("home_submit").click()
html = driver.page_source
infile=open("stack.html",'w')
infile.write(html)
infile.close()
You will have to install the Selenium package, as well as Phantom.JS.
I have tested this code, and it works fine. Let me know if you need any further help!
I know the content-type can be gotten from
response = urllib2.urlopen(url)
content-type = response.info().getheader('Content-type')
Now, I need to execute js code so I choose selenium with Phantomjs to fetch web page.
driver = webdriver.PhantomJS()
driver.get(url)
source = driver.page_source
How can I get content-type from source without downloading web page twice? I know I can save the response.read() as html file, and then driver render the local html file without downloading it again. However, it's too slow. Any suggestions?
Selenium does not get the headers but you can just request the head with requests:
import requests
print(requests.head(url).headers["Content-Type"])
You can use httplib2, urliib2 etc.. there are numerous answers here showing how to request the head with various libs.