I wrote this code:
import urllib
proxies = {'http': 'http://112.65.135.54:8080/'}
opener = urllib.FancyURLopener(proxies)
r = opener.open("http://www.python.org/")
print r.read()
and when I execute it this program works fine, and send for me source code of python.org But when i use this:
import urllib
proxies = {'http': 'http://80.176.245.196:1080/'}
opener = urllib.FancyURLopener(proxies)
r = opener.open("http://www.python.org/")
print r.read()
this program does not send me the source code of python.org
What am I going to do?
hehe :d i find the answer i must use "socks" instead of "http" :
import urllib
proxies = {'socks': 'http://80.176.245.196:1080/'}
opener = urllib.FancyURLopener(proxies)
r = opener.open("http://www.python.org/")
print r.read()
this code works fine
Presumably, the first IP address and port points to a working proxy, while the second set does not (they're on private IPs so of course nobody else can check). So, speak with whoever handles your local network, and get the exact specs for IP and port of the HTTP proxy you're supposed to use!
Edit: aargh, the question had been edited to "mask" the IPs (now they're back and they're definitely not on private networks!) -- so the answer was based on that. Anyway, no need for digging now, as the OP has already discovered that one is a socks proxy, not an http proxy, and so of course can't be treated as the latter;-).
Related
I need to connect/send msg to http://localhost:8001/path/to/my/service, but I am not able to find how to do that. I know how to send if I only have localhost and 8001, but I need this specific path /path/to/my/service. There is where my service is running.
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(<full-url-to-my-service>)
s.sendall(bytes('Message', 'utf-8'))
Update
My service is running on localhost:8001/api/v1/namespaces/my_namespace/services/my_service:http/proxy. How can I connect to it with python?
As #furas told in the comments
socket is primitive object and it doesn't have specialized method for this - and you have to on your own create message with correct data. You have to learn HTTP protocol and use it to send
This is a sample snippet to send a GET request in python using requests library
import requests
URL = 'http://localhost:8001/path/to/my/service'
response_text = requests.get(URL).text
print(response_text)
This assumes the Content-Type that GET URL produces is text. If it is json, then a minor change is required
import requests
URL = 'http://localhost:8001/path/to/my/service'
response_json = requests.get(URL).json()
print(response_json)
There are other ways to achieve the same using other good frameworks like urllib, and so on.
Here is the documentation of requests library for reference
sendall() requires bytes, so String must be encoded.
s.sendall("foobar".encode())
I am trying to scrape using proxies (this proxy server is a free one from the internet); in particular I would like to use their IP, not my private one. To test my script I am trying to Access "http://whatismyipaddress.com/" to see which IP this site sees. As it turns out it will see my private IP. Can somebody tell me what's wrong here?
import requests
from fake_useragent import UserAgent
def getMyIP(proxyServer,myPrivateIP):
scrape_website = "http://whatismyipaddress.com/"
ua = UserAgent()
headers = {'User-Agent': ua.random}
try:
response = requests.get(scrape_website,headers=headers,proxies={"https":proxyServer})
except:
faultString = proxyServer + " did not work; " + "\n"
print(faultString)
return
if myPrivateIP in str(response.content):
print("They found my private IP.")
proxyServer = "http://103.250.158.23:61219"
myPrivateIP = "xxx.xxx.xxx.xxx"
getMyIP(proxyServer,myPrivateIP)
Two things:
You set an {'https': ...} proxy configuration. This means for any HTTPS requests, it will use that proxy. You're requesting an HTTP URL however, so that proxy isn't getting used. Configure an 'http' proxy instead or in addition.
If the proxy forwards your IP in an HTTP header, and the target server heeds that header, that's tough luck and nothing you can do anything about, besides using a different proxy which doesn't forward your IP. I think point 1 is more likely the issue though.
I've looked through some of the other posts on this and I hope I'm not duplicating, but I'm stuck on a real headscratcher with setting a proxy server for urllib2. I'm running the below:
file, site = argv
uri = 'https://'+site
http_proxy_server = "http://newyork.wonderproxy.com"
http_proxy_port = "11001"
http_proxy_user = "user"
http_proxy_passwd = "password"
http_proxy_full_auth_string = "http://%s:%s#%s:%s" % (http_proxy_user,
http_proxy_passwd,
http_proxy_server,
http_proxy_port)
proxy_handler = urllib2.ProxyHandler({"http": http_proxy_full_auth_string})
opener = urllib2.build_opener(proxy_handler)
urllib2.install_opener(opener)
html = opener.open(uri).read()
print html, 'it opened!'
I'm running this against an IP info site, but try as I might the response always comes out with my non-proxy IP address. When I manually set my proxy through system settings I do get a different response, so I've confirmed it's not an issue with the proxy criteria itself.
Any help that could be offered would be much appreciated!
Well this is a bit silly, but I tried a different example and my connection is working fine now.
import urllib2
proxlist= ['minneapolis.wonderproxy.com', 'newyork.wonderproxy.com']
ports = [0,1,2,3]
for prox in proxlist:
for port in ports:
proxy = urllib2.ProxyHandler({'http': 'http://user:password#%s:1100%s'%(prox,port)})
auth = urllib2.HTTPBasicAuthHandler()
opener = urllib2.build_opener(proxy, auth, urllib2.HTTPHandler)
urllib2.install_opener(opener)
try:
conn = urllib2.urlopen('http://www.howtofindmyipaddress.com/')
return_str = conn.read()
str_find = '<span style="font-size: 80px; color: #22BB22; font-family: Calibri,Arial;">'
strt = return_str.find(str_find)+len(str_find)
print prox, port, return_str[strt:return_str.find('</span',strt)-1]
except urllib2.URLError:
print prox, port, 'That\'s a no go'
Only difference I can see is that the second one used HTTPHandler instead of Proxy, as I have an apparently solution I'm not too worried, but woudl still be interested to know why I had this issue in the first place.
Your question sets the proxy URL to
http://user:password#http://newyork.wonderproxy.com:11001
which isn't valid. If you changed http_proxy_server to newyork.wonderproxy.com then your first solution might work better.
Scope:
I am currently trying to write a Web scraper for this specific page. I have a pretty strong "Web Crawling" background using C#, but this httplib is beating me off.
Problem:
When trying to make a Http Get request for the page specified above I get a "Moved Permanently", that points to the very same URL. I can make a request using the requests lib, but I want to make it work using httplib so I can understand what I am doing wrong.
Code Sample:
I am completely new to Python, so any wrong language guideline or syntax is C#'s fault.
import httplib
# Wrapper for a "HTTP GET" Request
class HttpClient(object):
def HttpGet(self, url, host):
connection = httplib.HTTPConnection(host)
connection.request('GET', url)
return connection.getresponse().read()
# Using "HttpClient" class
httpclient = httpClient()
# This is the full URL I need to make a get request for : https://420101.com/strain-database
httpResponseText = httpclient.HttpGet('www.420101.com','/strain-database')
print httpResponseText
I really want to make it work using the httplib library, instead of requests or any other fancy one because I feel like I am missing something really small here.
The problem i've had too little or too much caffeine in my system.
To get a https, I needed the HTTPSConnection class.
Also, there is no 'www' in the address I wanted to GET. So, it shouldn't be included in the host.
Both of the wrong addresses redirect me to the correct one, with the 301 error code. If I were using requests or a more full featured module, it would have automatically followed the redirect.
My Validation:
c = httplib.HTTPSConnection('420101.com')
c.request("GET", "/strain-database")
r = c.getresponse()
print r.status, r.reason
200 OK
I am trying to write a man-in-the-middle for a webserver (to add extra services, not for nefarious reasons.
I am trying to pass a Host header, since the back-end put's it's address, as taken from the Host header, in the reply in lots of unpredictable places.
The original code is hundreds of lines, so I've simplified it to just the salient parts here.
import urllib2
opener = urllib2.build_opener(urllib2.HTTPHandler(debuglevel=1))
opener.addheaders.append(('Host','fakedomain.net'))
res = opener.open('http://www.google.com/doodles/finder/2014/All%20doodles')
res.read()
When I run this code, I expect Host: fakedomain.net to be passed to google's server. However, the debug code clearly shows Host: www.google.com\r\n. Changing Host to HostX works fine.
What is the correct way of sending a Host: header with an opener?
Note: this is a simplification; in the actual code, I am pointing to my own server, etc. - this is a simplification.
Use urllib2.Request,
import urllib2
opener = urllib2.build_opener(urllib2.HTTPHandler(debuglevel=1))
req = urllib2.Request('http://www.google.com/doodles/finder/2014/All%20doodles')
req.add_unredirected_header('Host', 'fakedomain.net')
res = opener.open(req)
res.read()
Thanks to Satoru who got me on the right track, and was almost what I was looking for, and certainly led me on to the right track.
The correct answer is:
import urllib2
opener = urllib2.build_opener(urllib2.HTTPHandler(debuglevel=1))
req = urllib2.Request('http://www.google.com/doodles/finder/2014/All%20doodles',None,{"Host":"fakedomain.net"})
res = opener.open(req)
res.read()
Sorry Satoru, I don't want to select your answer as correct, in case someone else finds my question, but I have upvoted it.