I want to use multiple http proxies. As per the documentation, I cannot find the way to pass multiple proxies.
Here is my code:-
proxies = {
'http': [List of IPs]
}
r = requests.get('http://10.1.7.70:8000', proxies=proxies)
While running this code, I get the following error:-
TypeError: unhashable type: 'list'
How can I use multiple proxies?
If your goal is to select a proxy from your list to use with requests:
import random
import requests
proxies_list = [List of IPs]
proxies = {
'http': random.choice(proxies_list)
}
r = requests.get('http://10.1.7.70:8000', proxies=proxies)
If you want to chain proxies, requests cannot do it, you need to do it by hand.
Proxies aren't lists, they are dictionaries indeed:
proxy = {key1:value, key2:value2,....}
You need to iterate by each key:
for i in proxy:
r = requests.get('http://10.1.7.70:8000', proxies=proxy[i])
Related
I'm using playwright to extract data from a website and I want to use proxies which I get from this website : https://www.proxy-list.download/HTTPS. It doesn't work, and I'm wondering if this is because the proxies are free ? If this is the reason, can someone know where can i find proxies that will work ?
This is my code :
from playwright.sync_api import sync_playwright
import time
url = "https://www.momox-shop.fr/livres-romans-et-litterature-C055/"
with sync_playwright() as p:
browser = p.firefox.launch(
headless=False,
proxy= {
'server': '209.166.175.201:3128'
})
page = browser.new_page()
page.goto(url)
time.sleep(5)
Thank you !
Yes, according to your link, all proxies are "dead"
Before using proxies try checking them here is one possible solution:
import json
import requests
from pythonping import ping
from concurrent.futures import ThreadPoolExecutor
check_proxies_url = "https://httpbin.org/ip"
good_proxy = set()
# proxy_lst = requests.get("https://www.proxy-list.download/api/v1/get", params={"type": "https"})
# proxies = [proxy for proxy in proxy_lst.text.split('\r\n') if proxy]
proxy_lst = requests.get("http://proxylist.fatezero.org/proxy.list")
proxies = (f"{json.loads(data)['host']}:{json.loads(data)['port']}" for data in proxy_lst.text.split('\n') if data)
def get_proxies(proxy):
proxies = {
"https": proxy,
"http": proxy
}
try:
response = requests.get(url=check_proxies_url, proxies=proxies, timeout=2)
response.raise_for_status()
if ping(target=proxies["https"].split(':')[0], count=1, timeout=2).rtt_avg_ms < 150:
good_proxy.add(proxies["https"])
print(f"Good proxies: {proxies['https']}")
except Exception:
print(f"Bad proxy: {proxies['https']}")
with ThreadPoolExecutor() as executor:
executor.map(get_proxies, proxies)
print(good_proxy)
Get a list of active proxies with ping up to 150ms.
Output:
{'209.166.175.201:8080', '170.39.194.156:3128', '20.111.54.16:80', '20.111.54.16:8123'}
But in any case, this is a shared proxy and their performance is not guaranteed. If you want to be sure that your parser will work, then it is better to buy a proxy.
I ran your code with received proxy '170.39.194.156:3128' and for now it works
I'm working on a webscrape function that's going to be pulling HTML data from internal (non public) servers. I have a connection through a VPN and proxy server so when I ping any public site I get code 200 no problem, but our internals are returning 401.
Heres my code:
http_str = f'http://{username}:{password}#proxy.yourorg.com:80'
proxyDict = {
'http' : http_str,
'https' : https_str,
'ftp' : https_str
}
html_text = requests.get(url, verify=True, proxies=proxyDict, auth=HTTPBasicAuth(user, pwd))
I've tried flushing my DNS server, using different certificate chains (that had a whole new list of problems). I'm using urllib3 on version 1.23 because that seemed to help with SSL errors. I've considered using a requests session but I'm not sure what that would change.
Also, the url's we're trying to access DO NOT require a log in. I'm not sure why its throwing 401 errors but the auth is for the proxy server, I think. Any help or idea are appreciated, along with questions as at this point I'm not even sure what to ask to move this along.
Edit: the proxyDict has a string with the user and pwd passed it for each type, https http fts, etc.
To use HTTP Basic Auth with your proxy, use the http://user:password#host/ syntax in any of the proxy configuration entries. See apidocs.
import requests
proxyDict = {
"http": "http://username:password#proxy.yourorg.com:80",
"https": "http://username:password#proxy.yourorg.com:80"
}
url = 'http://myorg.com/example'
response = requests.get(url, proxies=proxyDict)
If, however, you are accessing internal URLs via VPN (i.e., internal to your organization on your intranet) then you should NOT need the proxy to access them.
Try:
import requests
url = 'http://myorg.com/example'
response = requests.get(url, verify=False)
I am trying to scrape a website using python requests. We can only scrape the website using proxies so I implemented the code for that. However its banning all my requests even when i am using proxies, So I used a website https://api.ipify.org/?format=json to check whether proxies working properly or not. I found it showing my original IP even while using proxies. The code is below
from concurrent.futures import ThreadPoolExecutor
import string, random
import requests
import sys
http = []
#loading http into the list
with open(sys.argv[1],"r",encoding = "utf-8") as data:
for i in data:
http.append(i[:-1])
data.close()
url = "https://api.ipify.org/?format=json"
def fetch(session, url):
for i in range(5):
proxy = {'http': 'http://'+random.choice(http)}
try:
with session.get(url,proxies = proxy, allow_redirects=False) as response:
print("Proxy : ",proxy," | Response : ",response.text)
break
except:
pass
# #timer(1, 5)
if __name__ == '__main__':
with ThreadPoolExecutor(max_workers=1) as executor:
with requests.Session() as session:
executor.map(fetch, [session] * 100, [url] * 100)
executor.shutdown(wait=True)
I tried a lot but didn't understand how my ip address is getting shown instead of the proxy ipv4. You will find output of the code here https://imgur.com/a/z02uSvi
The problem that you have set proxy for http and sending request to website which uses https. Solution is simple:
proxies = dict.fromkeys(('http', 'https', 'ftp'), 'http://' + random.choice(http))
# You can set proxy for session
session.proxies.update(proxies)
response = session.get(url)
# Or you can pass proxy as argument
response = session.get(url, proxies=proxies)
General Background:
I have a list of proxies, and I have a list of PDF url's. I am downloading these PDF's.
I want to be able to switch proxies every couple of downloads.
I've seen the following in a few answers, but are all of the proxies used at once? Or is it random from the dict of proxies? How do I choose which proxy to use?
proxies = {
'https': 'http://username:password#ip:port',
'https': 'http://usernamepassword#ip:port',
'https': 'http://usernamepassword#ip:port',
'https': 'http://usernamepassword#ip:port',
'https': 'http://usernamepassword#ip:port',
'https': 'http://usernamepassword#ip:port'
}
Here is an example sample of the current code I have
My Code:
s = requests.Session()
data = {"Username":"usr", "Password":"psw"}
url = "https://someSite.com"
#Logging into the site
s.post(url, data=data) #add proxies=proxies here?
for download_url in PDFLinks:
temp = s.get(download_url).content
I have a list of usable proxy servers
https_proxy_list = "https://IP:port", "https://IP:port", "https://IP:port"
How do I change the proxy of a request.Session() object? for both POST and GET
By changing the proxy I don't have to re-log into the site, right?
Just have a list of proxies and then cycle through them
s = requests.Session()
proxyList = ['Just imagine there are a few proxies here']
for item in proxyList:
r2 = s.get(login_url, proxies = {'https' : item}, verify=False)
print r2.status_code
if r2.status_code == 200:
print "It worked"
usable_IP.append(item)
print usable_IP
print usable_IP
This is the code I'm currently using and it solved my problem that I was having. 12/13/2017
I'm experiencing some difficulty getting requests to utilise the proxy address when requesting a website. No error is returned but by getting the script to return http://ipecho.net/plain, I can see my own IP, not that of the proxy.
import random
import requests
import time
def proxy():
proxy = (random.choice(proxies)).strip()
print("selected proxy: {0}".format(proxy))
url = 'http://ipecho.net/plain'
data = requests.get(url, proxies={"https": proxy})
print(data)
print("data returned: {0}".format(data.text))
proxies = []
with open("proxies.txt", "r") as fi:
for line in fi:
proxies.append(line)
while True:
proxy()
time.sleep(5)
The structure of the proxies.txt file is as follows:
https://95.215.111.184:3128
https://79.137.80.210:3128
Can anyone explain this behaviour?
The URL you are passing is http and you only provide an https proxy key. You need to create a key in your proxies dictionary for both http and https. These can point to the same value.
proxies = {'http': 'http://proxy.example.com', 'https': 'http://proxy.example.com'}
data = requests.get(url, proxies=proxies)