So I am using the requests module, but I am trying to change the proxy every time a request is made (Ex, GET and POST). I have a dictionary of all the proxies I want to use, but I am having trouble getting the request to actually work through iterating through the dictionary. I understand how to send a request with a single proxy, but again, I am not sure how to with changing each proxy after every request. This is not the current program I am trying to write, but similarly the task I am trying to accomplish:
BASE_URL = "Some url"
USER_AGENT = "Some user agent"
POST_URL = "Some url"
proxies = {
'https' : 'proxy1',
'https' : 'proxy2',
'https' : 'proxy...'
}
def req():
session = requests.Session()
session.headers = {'user-agent': USER_AGENT}
session.headers.update({'Referer': BASE_URL})
req = session.get(BASE_URL, proxies=curProxy)
session.headers.update({'x-csrftoken': req.cookies['csrftoken']})
login_data = {'DATA HERE'}
login = session.post(POST_URL, data=login_data, allow_redirects=True, proxies=curProxy)
session.headers.update({'x-csrftoken': login.cookies['csrftoken']})
cookies = login.cookies
# For each proxy in proxies
for proxy in proxies:
# Updating the proxy to use
curProxy = proxy
req()
Thanks to all who reply in advance. All help/input is greatly appreciated!
You don't need a dictionary for your proxies. Use a plain list:
proxies = ['proxy1', 'proxy2', ...]
Change your function req to accept the proxy as a parameter. Global variables are evil :)
def req(curProxy):
...
req = session.get(BASE_URL, proxies={'http': curProxy, 'https': curProxy})
Then iterate
for proxy in proxies:
req(proxy)
Related
I need to use urllib/urllib2 libraries to login to a first website to retrieve session cookie that will allow me to log in to the proper, final website. Using requests library is pretty straight forward (I did it to make sure I can actually access the website):
import requests
payload = {"userName": "username", "password": "password", "apiKey": "myApiKey"}
url = "https://sso.somewebsite.com/api/authenticateme"
session = requests.session()
r = session.post(url, payload)
# Now that I have a cookie I can actually access my final website
r2 = session.get("https://websiteineed.somewebsite.com")
I tried to replicate this behavior using urllib/urllib2 libraries but keep getting HTTP Error 403: Forbidden:
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
values = {"userId": username , "password": password, "apiKey": apiKey}
url = 'https://sso.somewebsite.com/api/authenticateme'
data = urllib.urlencode(values)
req = urllib2.Request(url, data)
resp = urllib2.urlopen(req)
req2 = urllib2.Request('https://download.somewebsite.com')
resp2 = urllib2.urlopen(req2)
I tried solutions I found here and here and here but none of them worked for me... I would appreciate any suggestions!
The reason why the 'final page' was rejecting the cookies is because Python was adding 'User-agent', 'Python-urllib/2.7'to the header. After removing this element I was able to login to a website:
opener.addheaders.pop(0)
I am pretty new to python and I am trying to make a web scraper for a website called mangadex, I am trying to get a login function working but I can't seem to get the request part down. Can someone explain what am I doing wrong?
The search page is protected by the login page.
Here's my code:
import requests
def login(username: str, password: str):
url = "https://mangadex.cc/login/ajax/actions.ajax.php?function=login&nojs=1"
with requests.session() as session:
payload = {
"login_username": username,
"login_password": password
}
session.post(url, data=payload)
return session
def search(session, title):
resp = session.get("https://mangadex.cc/search", params={"title": title})
return resp.text
session = login("VALIDUSERNAME", "VALIDPASSWORD")
search(session, "foo")
the website: https://mangadex.cc/
First, the login url is wrong.
NO:
https://mangadex.cc/login/ajax/actions.ajax.php?function=login&nojs=1
YES:
https://mangadex.cc/ajax/actions.ajax.php?function=login
Second, the AJAX-Request requires a specific header.
x-requested-with: XMLHttpRequest
If you send an AJAX request without x-requested-with header, it will respond that you have attempted a hack.
Hacking attempt... Go away.
Third, don't close the session.
Code:
def login(username: str, password: str):
url = "https://mangadex.cc/ajax/actions.ajax.php?function=login"
header = {'x-requested-with': 'XMLHttpRequest'}
payload = {
"login_username": username,
"login_password": password,
}
session = requests.session()
req = session.post(url, headers=header, data=payload)
return session
I am trying to send a request to a URL with a proxy using requests module of Python (3.6.5). The request has been done successfully, but when I check the origin of the request (by printing req.content), it still shows my IP. Checking through the examples over the Internet, I couldn't get the point behind this problem.
def send_request(url):
header = get_random('UserAgents.txt')
proxy = get_random('ProxyList.txt')
print("Proxy: " + str(proxy))
proxies = {
'http' : 'http://' + str(proxy),
}
try:
session = requests.Session()
session.proxies = proxies
session.headers = HEADER
req = session.get(url)
# req = requests.get(url, headers = { 'User-Agent' : HEADER
# }, proxies = proxies)
print(req.content)
req.raise_for_status()
except Exception as e:
print(e)
sys.exit()
print('Request is successful!')
return req
There is a possibility that your particular proxy may not hide your i.p. You can try one of these free proxies (results are in json format)
getproxylist
gimmeproxy
pubproxy
and see if that works. btw, if what you want is anonymous web access, vpn is far better than proxy.
I am writing a python program which will send a post request with a password, if the password is correct, the server will return a special cookie "BDCLND".
I did this in Postman first. You can see the url, headers, the password I used and the response cookies in the snapshots below.
The response cookie didn't have the "BDCLND" cookie because the password 'ssss' was wrong. However, the server did send a 'BAIDUID' cookie back, now, if I sent another post request with the 'BAIDUID' cookie and the correct password 'v0vb', the "BDCLND" cookie would show up in the response. Like this:
Then I wrote the python program like this:
import requests
import string
import re
import sys
def main():
url = "https://pan.baidu.com/share/verify?surl=pK753kf&t=1508812575130&bdstoken=null&channel=chunlei&clienttype=0&web=1&app_id=250528&logid=MTUwODgxMjU3NTEzMTAuMzM2MTI4Njk5ODczMDUxNw=="
headers = {
"Content-Type":"application/x-www-form-urlencoded; charset=UTF-8",
"Referer":"https://pan.baidu.com/share/init?surl=pK753kf"
}
password={'pwd': 'v0vb'}
response = requests.post(url=url, data=password, headers=headers)
cookieJar = response.cookies
for cookie in cookieJar:
print(cookie.name)
response = requests.post(url=url, data=password, headers=headers, cookies=cookieJar)
cookieJar = response.cookies
for cookie in cookieJar:
print(cookie.name)
main()
When I run this, the first forloop did print up "BAIDUID", so that part is good, However, the second forloop printed nothing, it turned out the second cookiejar was just empty. I am not sure what I did wrong here. Please help.
Your second response has no cookies because you set the request cookies manually in the cookies parameter, so the server won't send a 'Set-Cookie' header.
Passing cookies across requests with the cookies parameter is not a good idea, use a Session object instead.
import requests
def main():
ses = requests.Session()
ses.headers['User-Agent'] = 'Mozilla/5'
url = "https://pan.baidu.com/share/verify?surl=pK753kf&t=1508812575130&bdstoken=null&channel=chunlei&clienttype=0&web=1&app_id=250528&logid=MTUwODgxMjU3NTEzMTAuMzM2MTI4Njk5ODczMDUxNw=="
ref = "https://pan.baidu.com/share/init?surl=pK753kf"
headers = {"Referer":ref}
password={'pwd': 'v0vb'}
response = ses.get(ref)
cookieJar = ses.cookies
for cookie in cookieJar:
print(cookie.name)
response = ses.post(url, data=password, headers=headers)
cookieJar = ses.cookies
for cookie in cookieJar:
print(cookie.name)
main()
import urllib3
import io
from bs4 import BeautifulSoup
import re
import cookielib
http = urllib3.PoolManager()
url = 'http://www.example.com'
headers = urllib3.util.make_headers(keep_alive=True,user_agent='Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6')
r = http.urlopen('GET', url, preload_content=False)
# Params die dann am Post request übergeben werden
params = {
'login': '/shop//index.php',
'user': 'username',
'pw': 'password'
}
suche = {
'id' : 'searchfield',
'name' : 'suche',
}
# Post Anfrage inkl params (login) Antwort in response.data
response = http.request('POST', url, params, headers)
suche = http.request('POST', site-to-search? , suche, headers)
html_suche = suche.data
print html_suche
I try to login with this code to a site and search after that.
With this code i get a answer that i am not loged in.
how can i combine that i first login and after that to search.
Thx.
Web servers track browser-like client state by setting cookies, which the client must return. By default, urllib3 does not pretend to be a browser, so we need to do a little extra work to relay the cookie back to the server. Here's an example of how to do this with httpbin.org:
import urllib3
http = urllib3.PoolManager()
# httpbin does a redirect right after setting a cookie, so we disable redirects
# for this request
r = http.request('GET', 'http://httpbin.org/cookies/set?foo=bar', redirect=False)
# Grab the set-cookie header and build our headers for our next request.
# Note: This is a simplified version of what a browser would do.
headers = {'cookie': r.getheader('set-cookie')}
print headers
# -> {'cookie': 'foo=bar; Path=/'}
r = http.request('GET', 'http://httpbin.org/cookies', headers=headers)
print r.body
# -> {
# "cookies": {
# "foo": "bar"
# }
# }
(Note: This recipe is useful and urllib3's documentation would benefit from having it. I'd appreciate a pull request which adds something to this effect.)
Other options, as mentioned by Martijn, is to use a higher-level library that pretends to be more like a browser. robobrowser looks like a great choice for this kind of work, but also requests has provisions for managing cookies for you and it uses urllib3 underneath. :)