In python I have:
cookies = dict(PHPSESSID='PHPSESSID=djsgkjhsdjkhj34',
authchallenge='sdifhshdfiuh34234234',
rishum='skdhfuihuisdhf-' + '10403111')
try:
response = requests.get(url, headers=headers, cookies=cookies, allow_redirects=False)
But I'm looking to use cookies value for the first time only and then use the new ones the server sets, how can I do that?
the solutions I found don't use default cookies for first request.
Note: I can't login automatically to the website since it uses auth challenge so everytime I login manually and change those cookies only for first request and then when the server updates them I want to see this affects my current cookies.
Example of how my website works:
At first I login using recaptcha and then get temp cookies,
for the first request in my app I want to use these temp cookies (already know them)
later, which each request I need to use the cookies from the previous response (they change with each request)
My current code:
def main():
start_time = time.time()
keep_running = True
while keep_running:
keep_running = execute_data()
time.sleep(5.0 - ((time.time() - start_time) % 5.0))
def execute_data():
url = 'https:me.happ.com/rishum/register/confirm'
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.16; rv:84.0) Gecko/20100101 Firefox/84.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate',
'Connection': 'close'
}
cookies = dict(rishum='dsfsdf21312zxcasd-' + '39480523')
try:
response = requests.get(url, headers=headers, cookies=cookies, allow_redirects=False)
You've almost got it but are a bit off on your dictionary implementation.
This is what you are looking for:
cookies = {
"PHPSESSID": "djsgkjhsdjkhj34",
"authchallenge" : "sdifhshdfiuh34234234",
"rishum": 'skdhfuihuisdhf-' + '10403111'
}
try:
response = requests.get(url, headers=headers, cookies=cookies, allow_redirects=False)
Edit: I see now that this isn't the issue but that you want to update cookies during a session, here is a simple example of how to do so with requests.Session:
from requests import Session
s = Session()
s.cookies["foo"] = "bar"
r = s.get('https://google.com')
print("Before:")
for cookie in s.cookies:
print(cookie)
print()
s.cookies["bo"] = "baz"
print("After: ")
for cookie in s.cookies:
print(cookie)
Edit #2:
To further answer your question, here is a better example of how you can update cookies(all of them, if needed) in a loop.
from requests import Session, cookies
s = Session()
b = s.get('https://google.com')
for cookie in s.cookies:
print(cookie.value)
# Iterate over cookies
for cookie in s.cookies:
# You can see we already have this cookie's info in the $cookie variable so lets delete it from the cookie jar
del s.cookies[cookie.name]
# You can update the values HERE
# ...
# Example:
cookieValue = cookie.value.upper()
# Then save the new cookie to the cookie jar.
updated_cookie = cookies.create_cookie(domain=cookie.domain,name=cookie.name,value=cookieValue)
s.cookies.set_cookie(updated_cookie)
for cookie in s.cookies:
print(cookie.value)
Related
http://order.uniteduk.co.uk/index.php/login
.
This is the website where i want to send login details and log myself in to extract other features. Can somebody help me do that using (Post) method of requests.
I dont want to use selenium
This is what i have tried
LOGIN = 'http://order.uniteduk.co.uk/index.php/login'
PROTECTED_PAGE = 'http://order.uniteduk.co.uk/index.php/home'
payload = {
'username': 'username',
'password': 'pwd'
}
import requests
s = requests.session()
response = s.post(LOGIN, data=payload)
print(response.text)
stuff = s.get(PROTECTED_PAGE)
print(stuff.text)
and this is what i am getting in return
Not Acceptable!Not Acceptable!An appropriate representation of the requested resource could not be found on this server. This error was generated by Mod_Security.
This website use nonce for security purpose in login request. You can find nonce on loin page and send nonce into payload with username and password. Add headers in your request.
import requests
from bs4 import BeautifulSoup
login_url = 'http://order.uniteduk.co.uk/index.php/login'
protected_page_url = 'http://order.uniteduk.co.uk/index.php/home'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5', 'Accept-Encoding': 'gzip, deflate'
}
# Create Session.
s = requests.session()
# Get nonce from Login page.
response = s.get(login_url, headers=headers)
soup = BeautifulSoup(response.text, 'lxml')
nonce = soup.find('input', {'id': 'woocommerce-login-nonce'}).get('value')
# Add Host, Origin and Referer into headers.
headers['Host'] = 'order.uniteduk.co.uk'
headers['Origin'] = 'http://order.uniteduk.co.uk'
headers['Referer'] = 'http://order.uniteduk.co.uk/index.php/login'
# Add nonce in payload.
payload = {
"username":"YourUsername",
"password":"YourPassword",
"woocommerce-login-nonce":nonce,
"_wp_http_referer":"/furniture/my-account/",
"login":"Login"}
# Login Request.
response = s.post(login_url, data=payload, headers=headers)
print(response.text)
# Login Protected Page Request.
stuff = s.get(protected_page_url, headers=headers)
print(stuff.text)
So I am using the requests module, but I am trying to change the proxy every time a request is made (Ex, GET and POST). I have a dictionary of all the proxies I want to use, but I am having trouble getting the request to actually work through iterating through the dictionary. I understand how to send a request with a single proxy, but again, I am not sure how to with changing each proxy after every request. This is not the current program I am trying to write, but similarly the task I am trying to accomplish:
BASE_URL = "Some url"
USER_AGENT = "Some user agent"
POST_URL = "Some url"
proxies = {
'https' : 'proxy1',
'https' : 'proxy2',
'https' : 'proxy...'
}
def req():
session = requests.Session()
session.headers = {'user-agent': USER_AGENT}
session.headers.update({'Referer': BASE_URL})
req = session.get(BASE_URL, proxies=curProxy)
session.headers.update({'x-csrftoken': req.cookies['csrftoken']})
login_data = {'DATA HERE'}
login = session.post(POST_URL, data=login_data, allow_redirects=True, proxies=curProxy)
session.headers.update({'x-csrftoken': login.cookies['csrftoken']})
cookies = login.cookies
# For each proxy in proxies
for proxy in proxies:
# Updating the proxy to use
curProxy = proxy
req()
Thanks to all who reply in advance. All help/input is greatly appreciated!
You don't need a dictionary for your proxies. Use a plain list:
proxies = ['proxy1', 'proxy2', ...]
Change your function req to accept the proxy as a parameter. Global variables are evil :)
def req(curProxy):
...
req = session.get(BASE_URL, proxies={'http': curProxy, 'https': curProxy})
Then iterate
for proxy in proxies:
req(proxy)
Sometimes when i try to get html code from a website with this code
import requests
url = "https://sit2play.com"
response = requests.get(url)
print response.content
i get this response
<h3 class="ielte9">
The browser you're using is not supported. Please use a different browser like Chrome or Firefox.
How can i avoid this, and get the real page content?
Add your user agent to the header of the request with
headers = {
'User-Agent': 'YOUR USER AGENT',
}
response = requests.get(url, headers=headers)
You can get your user agent from many websites like this.
Edit
If the solution above doesn't work for you, which might be because you are using an old version of requests, try this one:
headers = requests.utils.default_headers()
headers.update({
'User-Agent': 'YOUR USER AGENT',
})
response = requests.get(url, headers=headers)
I have this url, the content are produced in this way (php, it's supose to generate a random cookie on every request):
setcookie('token', md5(time()), time()+99999);
if(isset($_COOKIE['token'])) {
echo 'Cookie: ' .$_COOKIE['token'];
die();
}
echo 'Cookie not set yet';
As you can see, the cookie changes on every reload/refresh of the page. Now i have a python (python3) script with three completely independent from each other requests:
import requests
def get_req_data(req):
print('\n\ntoken: ', req.cookies['token'])
print('headers we sent: ', req.request.headers)
print('headers server sent back: ', req.headers)
url = 'http://migueldvl.com/heya/login/tests2.php'
headers = {
"User-agent" : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0.1) Gecko/20100101 Firefox/7.0.1',
"Referer": 'https://www.google.com'
}
req1 = requests.get(url, headers=headers)
get_req_data(req1)
req2 = requests.get(url, headers=headers)
get_req_data(req2)
req3 = requests.get(url, headers=headers)
get_req_data(req3)
How can be that we sometimes have the same cookie in diferent requests? If clearly it's program to change on every request?
If we:
import time
and add a
time.sleep(1) # wait one second before the next request
between requests, the cookie change all the time, this is the right and expected behaviour, but my question is why do we need this (time.sleep(1)) to be certain of the changing cookie? Wouldn't different requests be enough?
import urllib3
import io
from bs4 import BeautifulSoup
import re
import cookielib
http = urllib3.PoolManager()
url = 'http://www.example.com'
headers = urllib3.util.make_headers(keep_alive=True,user_agent='Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6')
r = http.urlopen('GET', url, preload_content=False)
# Params die dann am Post request übergeben werden
params = {
'login': '/shop//index.php',
'user': 'username',
'pw': 'password'
}
suche = {
'id' : 'searchfield',
'name' : 'suche',
}
# Post Anfrage inkl params (login) Antwort in response.data
response = http.request('POST', url, params, headers)
suche = http.request('POST', site-to-search? , suche, headers)
html_suche = suche.data
print html_suche
I try to login with this code to a site and search after that.
With this code i get a answer that i am not loged in.
how can i combine that i first login and after that to search.
Thx.
Web servers track browser-like client state by setting cookies, which the client must return. By default, urllib3 does not pretend to be a browser, so we need to do a little extra work to relay the cookie back to the server. Here's an example of how to do this with httpbin.org:
import urllib3
http = urllib3.PoolManager()
# httpbin does a redirect right after setting a cookie, so we disable redirects
# for this request
r = http.request('GET', 'http://httpbin.org/cookies/set?foo=bar', redirect=False)
# Grab the set-cookie header and build our headers for our next request.
# Note: This is a simplified version of what a browser would do.
headers = {'cookie': r.getheader('set-cookie')}
print headers
# -> {'cookie': 'foo=bar; Path=/'}
r = http.request('GET', 'http://httpbin.org/cookies', headers=headers)
print r.body
# -> {
# "cookies": {
# "foo": "bar"
# }
# }
(Note: This recipe is useful and urllib3's documentation would benefit from having it. I'd appreciate a pull request which adds something to this effect.)
Other options, as mentioned by Martijn, is to use a higher-level library that pretends to be more like a browser. robobrowser looks like a great choice for this kind of work, but also requests has provisions for managing cookies for you and it uses urllib3 underneath. :)