requests.exceptions.InvalidSchema: Missing dependencies for SOCKS support - python

I would like to try to open a page using proxy requests.
https://stackoverflow.com/…/make-requests-using-python-over…
I have this code:
def get_tor_session():
session = requests.session()
# Tor uses the 9050 port as the default socks port
session.proxies = {'http': 'socks5://127.0.0.1:9050',
'https': 'socks5://127.0.0.1:9050'}
return session
# Make a request through the Tor connection
# IP visible through Tor
session = get_tor_session()
print(session.get("http://httpbin.org/ip").text)
# Above should print an IP different than your public IP
# Following prints your normal public IP
print(requests.get("http://httpbin.org/ip").text)
But i see:
requests.exceptions.InvalidSchema: Missing dependencies for SOCKS support.
What should I do?
Thanks

In order to make requests use socks proxy, you need to install it with it's dependency.
pip install requests requests[socks]

This means that requests is using socks as a proxy and that socks is not installed.
Just run pip install pysocks

Related

Python: Can We SSL wrap any http server to https server?

This is a simple HTTPS python server (not for production use)
# libraries needed:
from http.server import HTTPServer, SimpleHTTPRequestHandler
import ssl , socket
# address set
server_ip = '0.0.0.0'
server_port = 3389
# configuring HTTP -> HTTPS
httpd = HTTPServer((server_ip, server_port), SimpleHTTPRequestHandler)
httpd.socket = ssl.wrap_socket(httpd.socket, certfile='./public_cert.pem',keyfile='./private_key.pem', server_side=True)
httpd.serve_forever()
as you see in the above script i wrap a simple python HTTP server with ssl certificate and make it a simple HTTPS server
Detailed example of python simple HTTPS server
Is there a way to ssl wrap any, already running HTTP server to HTTPS server ?
Suppose I have a HTTP server running at port 8080. Can I just ssl wrap it to port 443?

How do I host an HTTP proxy?

I am using Requests to scrape a website. My scraping code runs on one computer, but I need to make the requests come from a different computer (from the perspective of the website being scraped). I understand that I can do this with Requests by passing a proxies= argument when creating my session. I understand that I have two options, either using an HTTP proxy or a SOCKS proxy. I understand how to host an SOCKS proxy, because it just works over SSH, so I just need to make it so that I can SSH into the proxy machine from the machine running the scraping code and use -D, like this
# Generate key
ssh-keygen -o -a 100 -t ed25519 -C ''
# Copy key to proxy machine
ssh-copy-id -i ~/.ssh/id_ed25519.pub <username>#<ip of the computer acting as a proxy>
# Open a connection to that server on some local port (I randomly chose port 14171)
ssh -D 14171 root#<ip of the computer acting as a proxy>
then I can make requests like this
from requests import Session
proxies = {
'http': 'socks5://localhost:14171',
'https': 'socks5://localhost:14171',
}
session = Session()
session.proxies.update(proxies)
session.get('http://example.com')
I understand that with an HTTP proxy it's quite similar, I just do
proxies = {
'http': 'http://user:pass#10.10.1.10:1080',
'https': 'http://user:pass#10.10.1.10:1080',
}
but what do I use on the server to make it act as an HTTP proxy with a password? And are messages sent in the clear or encrypted?
There are many implementations of HTTP proxies to choose from. Squid seems to be the first result on Google. I also tried Tinyproxy. With Squid, you set it up like this:
Install Squid
apt install squid apache2-utils
Create the password file
sudo touch /etc/squid/squid_passwd
sudo chown proxy /etc/squid/squid_passwd
Then edit the configuration file
mv /etc/squid/squid.conf /etc/squid/squid.conf.default # move default file out of the way
vim /etc/squid/squid.conf
and paste the following as the configuration:
http_port 3128
auth_param basic program /usr/lib/squid/basic_ncsa_auth /etc/squid/squid_passwd
auth_param basic realm proxy
acl authenticated proxy_auth REQUIRED
http_access allow authenticated
you can add those lines to the end of the file, but the problem is that their default config is 8000 lines of documentation (about 25 lines of actual default config) and somewhere in there they forbid all connections (probably all connections not from localhost) that you'd have to read and ain't nobody got time for that, so I just cleared and put that config as the default. You should probably take the time to actually learn Squid if you're going to use it though...
Create a password for a user (youruser is a username, you can choose whatever)
htpasswd /etc/squid/squid_passwd youruser
Restart Squid
service squid restart
Open the port in the firewall
iptables -A INPUT -m state --state NEW -m tcp -p tcp --dport 3128 -j ACCEPT
You can then check that it works with Curl:
curl --proxy <the IP address of your proxy>:3128 --proxy-user youruser:<password> "http://icanhazip.com"
Tinyproxy is pretty similar, it has the advantage that you don't have to download a separate package just to set a password and their default config file is actually short enough to read...
Install Tinyproxy
sudo apt install tinyproxy
Edit the config file
sudo vim /etc/tinyproxy/tinyproxy.conf
These are the options I needed to set:
change the port to some random port Port 17724
comment out the Allow 127.0.0.1 line to allow connections from any IP
add a line to enable a password BasicAuth youruser yourpassword
(optional) disable adding a "Via" header (this is a way to let servers that you're making requests to know that you're using a proxy) with DisableViaHeader yes
(optional) disable everything except for reverse-proxying with ReverseOnly Yes
You may want to read through the entire default config file, maybe there are other options you need for your use-case.
Restart the Tinyproxy systemd service
sudo service tinyproxy restart
Open the port in the firewall
sudo iptables -A INPUT -m state --state NEW -m tcp -p tcp --dport 17724 -j ACCEPT
The you can then use your proxy with Requests like this
proxies = {
'http': 'http://<youruser>:<password>#<the IP address of your proxy>:3128',
'https': 'http://<youruser>:<password>#<the IP address of your proxy>:3128'
}
Proxies also allow you to limit connections to only a give IP address, so if the server you're running the code on has a static IP, it would be a good idea to limit connections only from that IP. Note that HTTP proxying is not encrypted, so a man-in-the-middle would be able to see your password and then use your proxy.
Sources:
https://www.vultr.com/docs/how-to-install-squid-proxy-on-centos
https://www.vultr.com/docs/install-squid-proxy-on-ubuntu (a bit outdated)

ERROR: Proxy URL had no scheme. However, URL & Proxies are properly setup

I'm getting the error:
urllib3.exceptions.ProxySchemeUnknown: Proxy URL had no scheme, should start with http:// or https://
but the proxies are fine & so is the URL.
URL = f"https://google.com/search?q={query2}&num=100"
mysite = self.listbox.get(0)
headers = {"user-agent": USER_AGENT}
while True:
proxy = next(proxy_cycle)
print(proxy)
proxies = {"http": proxy, "https": proxy}
print(proxies)
resp = requests.get(URL, proxies=proxies, headers=headers)
if resp.status_code == 200:
break
Print results:
41.139.253.91:8080
{'http': '41.139.253.91:8080', 'https': '41.139.253.91:8080'}
On Linux unset http_proxy and https_proxy using terminal on the current location of your project
unset http_proxy
unset https_proxy
I had the same problem and setting in my terminal https_proxy variable really helped me. You can set it as follows:
set HTTPS_PROXY=http://username:password#proxy.example.com:8080
set https_proxy=http://username:password#proxy.example.com:8080
Where proxy.example.com is the proxy address (in my case it is "localhost") and 8080 is my port.
You can figure out your username by typing echo %username% in your command line. As for the proxy server, on Windows, you need to go to "Internet Options" -> "Connections" -> LAN Settings and tick "Use a proxy server for your LAN". There, you can find your proxy address and port.
An important note here. If you're using PyCharm, try first running your script from the terminal. I say this because you may get the same error if you will just run the file by "pushing" the button. But using the terminal may help you get rid of this error.
P.S. Also, you can try to downgrade your pip to 20.2.3 as it may help you too.
I was having same issue. I resolved with upgrading requests library in python3 by
pip3 install --upgrade requests
I think it is related to lower version of requests library conflicting higher version of python3

Curl command in Requests Python

I am trying write a python equivalent script for the cURLcommand below using Requests library. I am not able to find relevant flags to disable SSL verification and set no proxy.
curl -v -k -T debug.zip https://url-to-no-ssl-server/index.aspx --noproxy url-to-no-ssl-server -X POST -H "filename: debug.zip"
How do I convert this command to python-requests?
This SO Answer shows how to disable proxies:
session = requests.Session()
session.trust_env = False
The documentation for requests has disabling SSL verification:
Requests can also ignore verifying the SSL certificate if you set verify to False:
requests.get('https://kennethreitz.com', verify=False)
<Response [200]>
By default, verify is set to True. Option verify only applies to host certs.

Alternative to scrapy proxy

Is there any alternative to using proxy in scrapy. The source site has blocked the server which I'm using for running spiders. I've added ProxyMiddleware in the project and randomized the proxy. But the problem is the proxies are also being blocked by the source site. I've also set the DOWNLOAD_DELAY to 5 but the problem is still alive. Is there any other way to access the site without using proxies other than shifting to new server?
Using tor with privoxy solved my problem of blocking.
Install tor
$ sudo apt-get install tor
Install polipo
$ sudo apt-get install polipo
configure privoxy to use tor socks proxy.
$sudo nano /etc/polipo/config
Add following lines at the end of file.
socksParentProxy = localhost:9050
diskCacheRoot=""
disableLocalInterface=""
Add proxy middleware in middlewares.py.
class ProxyMiddleware(object):
def process_request(self, request, spider):
request.meta['proxy'] = 'http://localhost:8123'
spider.log('Proxy : %s' % request.meta['proxy'])
Activate the proxyMiddleware in Project settings.
DOWNLOADER_MIDDLEWARES = {
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware': 110,
'project_name.middlewares.ProxyMiddleware': 100
}
You may want the squid.
It will shield failure proxy, use proxy faster, automatic rotation, automatic retry forwarding, and set the rules.
Just set your spider to the same export agent.

Categories