How do I set cookies using Python urlopen? - python

I am trying to fetch an html site using Python urlopen.
I am getting this error:
HTTPError: HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop
The code:
from urllib2 import Request
request = Request(url)
response = urlopen(request)
I understand that the server redirects to another URL and that it is looking for a cookie.
How do I set the cookie it is looking for so I can read the html?

Here's an example from Python documentation, adjusted to your code:
import cookielib, urllib2
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
request = urllib2.Request(url)
response = opener.open(request)

Related

Python post requests and read cookies in the response

I am writing a python program which will send a post request with a password, if the password is correct, the server will return a special cookie "BDCLND".
I did this in Postman first. You can see the url, headers, the password I used and the response cookies in the snapshots below.
The response cookie didn't have the "BDCLND" cookie because the password 'ssss' was wrong. However, the server did send a 'BAIDUID' cookie back, now, if I sent another post request with the 'BAIDUID' cookie and the correct password 'v0vb', the "BDCLND" cookie would show up in the response. Like this:
Then I wrote the python program like this:
import requests
import string
import re
import sys
def main():
url = "https://pan.baidu.com/share/verify?surl=pK753kf&t=1508812575130&bdstoken=null&channel=chunlei&clienttype=0&web=1&app_id=250528&logid=MTUwODgxMjU3NTEzMTAuMzM2MTI4Njk5ODczMDUxNw=="
headers = {
"Content-Type":"application/x-www-form-urlencoded; charset=UTF-8",
"Referer":"https://pan.baidu.com/share/init?surl=pK753kf"
}
password={'pwd': 'v0vb'}
response = requests.post(url=url, data=password, headers=headers)
cookieJar = response.cookies
for cookie in cookieJar:
print(cookie.name)
response = requests.post(url=url, data=password, headers=headers, cookies=cookieJar)
cookieJar = response.cookies
for cookie in cookieJar:
print(cookie.name)
main()
When I run this, the first forloop did print up "BAIDUID", so that part is good, However, the second forloop printed nothing, it turned out the second cookiejar was just empty. I am not sure what I did wrong here. Please help.
Your second response has no cookies because you set the request cookies manually in the cookies parameter, so the server won't send a 'Set-Cookie' header.
Passing cookies across requests with the cookies parameter is not a good idea, use a Session object instead.
import requests
def main():
ses = requests.Session()
ses.headers['User-Agent'] = 'Mozilla/5'
url = "https://pan.baidu.com/share/verify?surl=pK753kf&t=1508812575130&bdstoken=null&channel=chunlei&clienttype=0&web=1&app_id=250528&logid=MTUwODgxMjU3NTEzMTAuMzM2MTI4Njk5ODczMDUxNw=="
ref = "https://pan.baidu.com/share/init?surl=pK753kf"
headers = {"Referer":ref}
password={'pwd': 'v0vb'}
response = ses.get(ref)
cookieJar = ses.cookies
for cookie in cookieJar:
print(cookie.name)
response = ses.post(url, data=password, headers=headers)
cookieJar = ses.cookies
for cookie in cookieJar:
print(cookie.name)
main()

Glassdoor API login not working with Python, response 403 Bots not allowed

Unfortunately I get the error: HTTP Status 403 - Bots not allowed while using the following Python code.
import requests
URL = 'http://api.glassdoor.com/api/api.htm?v=1&format=json&t.p={PartnerID}&t.k={Key}&action=employers&q=pharmaceuticals&userip={IP_address}&useragent=Mozilla/%2F4.0'
response = requests.get(URL)
print(response)
The URL does work when I try it from my browser. What can I do to make it work from a code?
Update: SOLVED.
Apologies for not posting the question in the right way (I am new at SO).
According to this StackOverflow answer, you need to include a header field (note that this example uses urllib2 rather than requests):
import urllib2, sys
url = "http://api.glassdoor.com/api/api.htm?t.p=yourID&t.k=yourkey&userip=8.28.178.133&useragent=Mozilla&format=json&v=1&action=employers&q="
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(url,headers=hdr)
response = urllib2.urlopen(req)
with the requests module, it's probably:
import requests
URL = 'http://api.glassdoor.com/api/api.htm?v=1&format=json&t.p={PartnerID}&t.k={Key}&action=employers&q=pharmaceuticals&userip={IP_address}&useragent=Mozilla/%2F4.0'
headers = {'user-agent': 'Mozilla/5.0'}
response = requests.get(url, headers=headers)
print(response)

Python throwing 401 error when trying to access SharePoint lists

I have tried basic auth while following this link. I have also followed this question to get my code below using NTLM auth. I am still being thrown a 401 error. Is this an outdated way of pulling SharePoint lists or is there something wrong with my code?
import requests
from requests_ntlm import HttpNtlmAuth
response = requests.get("https://example.com/_api/web/...", auth=HttpNtlmAuth('username', 'password'))
print(response.status_code)
Your approach is good enough.
Try few changes as below:
import requests
from requests_ntlm import HttpNtlmAuth
url="https://sharepointexample.com/"
user, pwd = "username", "pwd"
headers = {'accept': 'application/json;odata=verbose'}
r = requests.get(url, auth=HttpNtlmAuth(user, pwd), headers=headers)
print(r.status_code)
print(r.content)
Here you wont encounter 401 response, instead you will get Response as 200, which indicates the HTTP response is OK!!..
Next the content will show you the list options which you can parse as html page.
Hope this helps!!

how to disable SSL authentication in python 3

I am new to python. I have a script, trying to post something to a site. now how do I disable SSL authentication in the script?
In python2, you can use
requests.get('https://kennethreitz.com', verify=False)
but I don't know how to do it in python 3.
import urllib.parse
import urllib.request
url = 'https://something.com'
headers = { 'APILOGIN' : "user",
'APITOKEN' : "passwd"}
values = {"dba":"Test API Merchant","web":"","mids.mid":"ACH"}
data = urllib.parse.urlencode(values)
data = data.encode('utf-8') # data should be bytes
req = urllib.request.Request(url, data, headers)
with urllib.request.urlopen(req) as response:
the_page = response.read()
See Verifying HTTPS certificates with urllib.request - by not specifying either cafile or capath in your call to urlopen, by default any HTTPS connection is not verified.

urllib2 - post request returns 'your browser does not support iframes' error

Im performing a simple post request with urllib2 on a HTTPS url, i have one parameter and a JSESSIONID from a logged-in user. However when i Post i get 'your browser does not support iframes' error, status HTTP:200
import cookielib
import urllib
import urllib2
url = 'https://.../template/run.do?id=4'
http_header = {
"JSESSIONID": "A4604B1CFA8D2B5A8296AAB3B5EADC0C;",
}
params = {
'id' : 4
}
# setup cookie handler
cookie_jar = cookielib.LWPCookieJar()
cookie = urllib2.HTTPCookieProcessor(cookie_jar)
opener = urllib2.build_opener(cookie)
req = urllib2.Request(url, urllib.urlencode(params), http_header)
res = urllib2.urlopen(req)
print res.read()
I keep trigerring this method using CURL with no problem , but somehow can't via urllib, i DID try using all Request Headers that are used by browser but to no avail.
I fear this might be a stupid misconception, but I'm already wondering for hours!

Categories