i just tried to do URL requests with selenium like this
driver.get (example.com/)
but i confused how to do same requests with parameter, like i do with requests
params = {
'name':'john-doe',
'shop_id':'121323233443',
}
requests.get('example.com/', params=params)
request is a backend interaction library and selenium is an front end autoamtion library , you cannot explicitly send plain http calls using selenium. WOrk around is to use javascript executor in selenium to do http ajax calls using javascript inside browser. This not recommended as this just aover engineering design unless its sepcific required use case:
https://stackoverflow.com/a/5665291/6793637
Related
In my application, I have my API that is in localhost:8000/api/v0.1/save_with_post.
I've also made a Python Script in order to do a Post Request on such Api.
### My script
import requests
url = 'localhost:8000/api/v0.1/save_with_post'
myobj = {'key': 'value'}
x = requests.post(url, data = myobj)
Is it possible to view headers and body of the request in Chrome rather than debugging my application code?
You want Postman.
With Postman you can either generate a request to your service from Postman itself, or set up Postman as a proxy so you can see the requests that your API client is generating and the responses from the server.
If you want to view the response headers from the post request, have you tried:
>>> x.headers
Or you could just add headers yourself to your POST request as so:
h = {"Content-Type": "application/xml", ("etc")}
x = requests.post(url, data = myobj, headers = h)
well, I don't know if there is a way you could view the request in Chrome DevTools directly (I don't think there is) however, I know there are two alternatives for seeing the request body and response:
1 - use selenium with a chrome webdriver
this will allow you to run chrome automated by python. then you can open a test page and run javascript in it to do your post request,
see this for more info on how to do this:
1 https://selenium-python.readthedocs.io/getting-started.html
2 Getting the return value of Javascript code in Selenium
you will need to use Selenium-requests library to use requests library with selenium
https://pypi.org/project/selenium-requests/3
2 - use Wireshark
this program will allow you to see all the traffic that is going on your network card and therefore you will be able to monitor all the requests going back and forth. however, Wireshark will throw all the traffic that you network card send or receives it may be hard to see the specific request you want
I am creating a script for this site:
The first section (making the account is done):
https://my.shaadi.com/profile-creation/step/1?gtrk=1
However when configuring profiles I am having an issue, the page is loaded by JS and the token is generated using JS as well.
https://my.shaadi.com/static/js/main.4c82cc30.js
this is the JS file:
X-Access-Token: 2a719ecb4cf7a3ef45676834a596bc58|4SH80109362|
X-App-Key: 69c3f1c1ea31d60aa5516a439bb65949cf3f8a1330679fa7ff91fc9a5681b564
These are the 2 tokens I am looking to get
I can't figure out a way of getting these is it possible to use requests to do this or would it require a headless browser to run the JS (I am wanting to do it in pure python requests)
The best/easiest is use selenium or dryscrape and BeautifulSoup.
#from bs4 import BeautifulSoup
from selenium import webdriver
client = webdriver.PhantomJS()
#client.get('https://my.shaadi.com/profile-creation/step/1?gtrk=1')
client.get('https://my.shaadi.com/static/js/main.4c82cc30.js')
body = client.page_source
Now you can parse body with regexp or BeautifulSoup
When I execute manually this URL in my webbrowser I see in my network console that three other requests will be executed. it works.
call: www.my.url/publish_something
get this cmd
get that cmd
post that...
How can I do it in Python requests?
That I only call once the "main"-URL including all sub-requests like my webbrowser.
> publish_url = "www.my.url/publish_something" r =
> self.session.get(publish_url, verify=False, params=p)
it seems, when I call this url with python requests-module, he does not execute the sub-requests.
When you open an url in your browser, the browser
- issues a GET request to that url
- parse the content
- issues GET requests for each image tag and for each script, style etc tags mentionning an external source,
- executes the scripts, which may lead to more sub requests and DOM modifications,
- and finally render the final DOM.
When you send a GET request with Python (with python-rquests, the urllib module or whatever), on the first of the above stages is performed, so if you want more you'll have to do it by yourself (parsing the content, retrieving images etc etc).
Or you can use a headless browser like PhantomJS.
I'm trying to crawl a website using the requests library. However, the particular website I am trying to access (http://www.vi.nl/matchcenter/vandaag.shtml) has a very intrusive cookie statement.
I am trying to access the website as follows:
from bs4 import BeautifulSoup as soup
import requests
website = r"http://www.vi.nl/matchcenter/vandaag.shtml"
html = requests.get(website, headers={"User-Agent": "Mozilla/5.0"})
htmlsoup = soup(html.text, "html.parser")
This returns a web page that consists of just the cookie statement with a big button to accept. If you try accessing this page in a browser, you find that pressing the button redirects you to the requested page. How can I do this using requests?
I considered using mechanize.Browser but that seems a pretty roundabout way of doing it.
Try setting:
cookies = dict(BCPermissionLevel='PERSONAL')
html = requests.get(website, headers={"User-Agent": "Mozilla/5.0"}, cookies=cookies)
This will bypass the cookie consent page and will land you staight to the page.
Note: You could find the above by analyzing the javascript code that is run on the cookie concent page, it is a bit obfuscated but it should not be difficult. If you run into the same type of problem again, take a look at what kind of cookies does the javascript code that is executed upon a event's handling sets.
I have found this SO question which asks how to send cookies in a post using requests. The accepted answer states that the latest build of Requests will build CookieJars for you from simple dictionaries. Below is the POC code included in the original answer.
import requests
cookie = {'enwiki_session': '17ab96bd8ffbe8ca58a78657a918558'}
r = requests.post('http://wikipedia.org', cookies=cookie)
import requests
from bs4 import BeautifulSoup
a = requests.Session()
soup = BeautifulSoup(a.get("https://www.facebook.com/").content)
payload = {
"lsd":soup.find("input",{"name":"lsd"})["value"],
"email":"my_email",
"pass":"my_password",
"persistent":"1",
"default_persistent":"1",
"timezone":"300",
"lgnrnd":soup.find("input",{"name":"lgnrnd"})["value"],
"lgndim":soup.find("input",{"name":"lgndim"})["value"],
"lgnjs":soup.find("input",{"name":"lgnjs"})["value"],
"locale":"en_US",
"qsstamp":soup.find("input",{"name":"qsstamp"})["value"]
}
soup = BeautifulSoup(a.post("https://www.facebook.com/",data = payload).content)
print([i.text for i in soup.find_all("a")])
Im playing around with requests and have read several threads here in SO about it so I decided to try it out myself.
I am stumped by this line. "qsstamp":soup.find("input",{"name":"qsstamp"})["value"]
because it returns empty thereby cause an error.
however looking at chrome developer tools this "qsstamp" is populated what am I missing here?
the payload is everything shown in the form data on chrome dev tools. so what is going on?
Using Firebug and search for qsstamp gives matched results directs to: Here
You can see: j.createHiddenInputs({qsstamp:u},v)
That means qsstamp is dynamically generated by JavaScript.
requests will not run JavaScript(since what it does is to fetch that page's HTML.) You may want to use something like dryscape or using emulated browser like Selenium.