How do you change the proxy of a requests.Session() Object? - python

General Background:
I have a list of proxies, and I have a list of PDF url's. I am downloading these PDF's.
I want to be able to switch proxies every couple of downloads.
I've seen the following in a few answers, but are all of the proxies used at once? Or is it random from the dict of proxies? How do I choose which proxy to use?
proxies = {
'https': 'http://username:password#ip:port',
'https': 'http://usernamepassword#ip:port',
'https': 'http://usernamepassword#ip:port',
'https': 'http://usernamepassword#ip:port',
'https': 'http://usernamepassword#ip:port',
'https': 'http://usernamepassword#ip:port'
}
Here is an example sample of the current code I have
My Code:
s = requests.Session()
data = {"Username":"usr", "Password":"psw"}
url = "https://someSite.com"
#Logging into the site
s.post(url, data=data) #add proxies=proxies here?
for download_url in PDFLinks:
temp = s.get(download_url).content
I have a list of usable proxy servers
https_proxy_list = "https://IP:port", "https://IP:port", "https://IP:port"
How do I change the proxy of a request.Session() object? for both POST and GET
By changing the proxy I don't have to re-log into the site, right?

Just have a list of proxies and then cycle through them
s = requests.Session()
proxyList = ['Just imagine there are a few proxies here']
for item in proxyList:
r2 = s.get(login_url, proxies = {'https' : item}, verify=False)
print r2.status_code
if r2.status_code == 200:
print "It worked"
usable_IP.append(item)
print usable_IP
print usable_IP
This is the code I'm currently using and it solved my problem that I was having. 12/13/2017

Related

Using proxies with playwright in python

I'm using playwright to extract data from a website and I want to use proxies which I get from this website : https://www.proxy-list.download/HTTPS. It doesn't work, and I'm wondering if this is because the proxies are free ? If this is the reason, can someone know where can i find proxies that will work ?
This is my code :
from playwright.sync_api import sync_playwright
import time
url = "https://www.momox-shop.fr/livres-romans-et-litterature-C055/"
with sync_playwright() as p:
browser = p.firefox.launch(
headless=False,
proxy= {
'server': '209.166.175.201:3128'
})
page = browser.new_page()
page.goto(url)
time.sleep(5)
Thank you !
Yes, according to your link, all proxies are "dead"
Before using proxies try checking them here is one possible solution:
import json
import requests
from pythonping import ping
from concurrent.futures import ThreadPoolExecutor
check_proxies_url = "https://httpbin.org/ip"
good_proxy = set()
# proxy_lst = requests.get("https://www.proxy-list.download/api/v1/get", params={"type": "https"})
# proxies = [proxy for proxy in proxy_lst.text.split('\r\n') if proxy]
proxy_lst = requests.get("http://proxylist.fatezero.org/proxy.list")
proxies = (f"{json.loads(data)['host']}:{json.loads(data)['port']}" for data in proxy_lst.text.split('\n') if data)
def get_proxies(proxy):
proxies = {
"https": proxy,
"http": proxy
}
try:
response = requests.get(url=check_proxies_url, proxies=proxies, timeout=2)
response.raise_for_status()
if ping(target=proxies["https"].split(':')[0], count=1, timeout=2).rtt_avg_ms < 150:
good_proxy.add(proxies["https"])
print(f"Good proxies: {proxies['https']}")
except Exception:
print(f"Bad proxy: {proxies['https']}")
with ThreadPoolExecutor() as executor:
executor.map(get_proxies, proxies)
print(good_proxy)
Get a list of active proxies with ping up to 150ms.
Output:
{'209.166.175.201:8080', '170.39.194.156:3128', '20.111.54.16:80', '20.111.54.16:8123'}
But in any case, this is a shared proxy and their performance is not guaranteed. If you want to be sure that your parser will work, then it is better to buy a proxy.
I ran your code with received proxy '170.39.194.156:3128' and for now it works

problem with python requests while using proxies

I am trying to scrape a website using python requests. We can only scrape the website using proxies so I implemented the code for that. However its banning all my requests even when i am using proxies, So I used a website https://api.ipify.org/?format=json to check whether proxies working properly or not. I found it showing my original IP even while using proxies. The code is below
from concurrent.futures import ThreadPoolExecutor
import string, random
import requests
import sys
http = []
#loading http into the list
with open(sys.argv[1],"r",encoding = "utf-8") as data:
for i in data:
http.append(i[:-1])
data.close()
url = "https://api.ipify.org/?format=json"
def fetch(session, url):
for i in range(5):
proxy = {'http': 'http://'+random.choice(http)}
try:
with session.get(url,proxies = proxy, allow_redirects=False) as response:
print("Proxy : ",proxy," | Response : ",response.text)
break
except:
pass
# #timer(1, 5)
if __name__ == '__main__':
with ThreadPoolExecutor(max_workers=1) as executor:
with requests.Session() as session:
executor.map(fetch, [session] * 100, [url] * 100)
executor.shutdown(wait=True)
I tried a lot but didn't understand how my ip address is getting shown instead of the proxy ipv4. You will find output of the code here https://imgur.com/a/z02uSvi
The problem that you have set proxy for http and sending request to website which uses https. Solution is simple:
proxies = dict.fromkeys(('http', 'https', 'ftp'), 'http://' + random.choice(http))
# You can set proxy for session
session.proxies.update(proxies)
response = session.get(url)
# Or you can pass proxy as argument
response = session.get(url, proxies=proxies)

How do I find the required parameters for the payload when logging into a website with requests?

This is what I have so far. I'm very new to this so point out if I'm doing anything wrong.
import requests
url = "https://9anime.to/user/watchlist"
payload = {
"username":"blahblahblah",
"password":"secretstringy"
# anything else?
}
with requests.Session() as s:
res = s.post(url, data=payload)
print(res.status_code)
You will need to inspect the form and see where the form is posting to in the action tag. In this case it is posting to user/ajax/login. Instead of requesting the watchlist URL you should post those details to the loginurl. Once you are logged in you can request your watchlist.
from lxml.html import fromstring
import requests
url = "https://9anime.to/user/watchlist"
loginurl = "https://9anime.to/user/ajax/login"
payload = {
"username":"someemail#gmail.com",
"password":"somepass"
}
with requests.Session() as s:
res = s.post(loginurl, data=payload)
print(res.content)
# b'{"success":true,"message":"Login successful"}'
res = s.get(url)
tree = fromstring(res.content)
elem = tree.cssselect("div.watchlist div.widget-body")[0]
print(elem.text_content())
# Your watch list is empty.
You would need to have knowledge (documentation of some form) on what that URL is expecting and how you are expected to interact with it. There is no way to know just given the information you have provided.
If you have some system that is able to interact with that URL already (e.g. your able to log in with your browser), then you could try to reverse-engineer what it is your browser is doing...

Python Requests Library not utilising proxy

I'm experiencing some difficulty getting requests to utilise the proxy address when requesting a website. No error is returned but by getting the script to return http://ipecho.net/plain, I can see my own IP, not that of the proxy.
import random
import requests
import time
def proxy():
proxy = (random.choice(proxies)).strip()
print("selected proxy: {0}".format(proxy))
url = 'http://ipecho.net/plain'
data = requests.get(url, proxies={"https": proxy})
print(data)
print("data returned: {0}".format(data.text))
proxies = []
with open("proxies.txt", "r") as fi:
for line in fi:
proxies.append(line)
while True:
proxy()
time.sleep(5)
The structure of the proxies.txt file is as follows:
https://95.215.111.184:3128
https://79.137.80.210:3128
Can anyone explain this behaviour?
The URL you are passing is http and you only provide an https proxy key. You need to create a key in your proxies dictionary for both http and https. These can point to the same value.
proxies = {'http': 'http://proxy.example.com', 'https': 'http://proxy.example.com'}
data = requests.get(url, proxies=proxies)

Requests - Multiple proxies python

I want to use multiple http proxies. As per the documentation, I cannot find the way to pass multiple proxies.
Here is my code:-
proxies = {
'http': [List of IPs]
}
r = requests.get('http://10.1.7.70:8000', proxies=proxies)
While running this code, I get the following error:-
TypeError: unhashable type: 'list'
How can I use multiple proxies?
If your goal is to select a proxy from your list to use with requests:
import random
import requests
proxies_list = [List of IPs]
proxies = {
'http': random.choice(proxies_list)
}
r = requests.get('http://10.1.7.70:8000', proxies=proxies)
If you want to chain proxies, requests cannot do it, you need to do it by hand.
Proxies aren't lists, they are dictionaries indeed:
proxy = {key1:value, key2:value2,....}
You need to iterate by each key:
for i in proxy:
r = requests.get('http://10.1.7.70:8000', proxies=proxy[i])

Categories