urllib2 error 'Not found on Accelerator' - python

I have a python program that periodically checks the weather from weather.yahooapis.com, but it always throws the error: urllib.HTTPError: HTTP Error 404: Not Found on Accelerator. I have tried on two different computers with no luck, as well as changing my DNS settings. I continue to get the error. Here is my code:
#!/usr/bin/python
import time
#from Adafruit_CharLCDPlate import Adafruit_CharLCDPlate
from xml.dom import minidom
import urllib2
#towns, as woeids
towns = [2365345,2366030,2452373]
val = 1
while val == 1:
time.sleep(2)
for i in towns:
mdata = urllib2.urlopen('http://206.190.43.214/forecastrss?w='+str(i)+'&u=f')
sdata = minidom.parseString(mdata)
atm = sdata.getElementsByTagName('yweather:atmosphere')[0]
current = sdata.getElementsByTagName('yweather:condition')[0]
humid = atm.attributes['humidity'].value
tempf = current.attributes['temp'].value
print(tempf)
time.sleep(8)
I can successfully access the output of the API through a web browser on the same computers that give me the error.

The problem is that you're using the IP address 206.190.43.214 rather than the hostname weather.yahooapis.com.
Even though they resolve to the same host (206.190.43.214, obviously), the name that's actually in the URL ends up as the Host: header in the HTTP request. And you can tell that this makes the difference here:
$ curl 'http://206.190.43.214/forecastrss?w=2365345&u=f'
<404 error>
$ curl 'http://weather.yahooapis.com/forecastrss?w=2365345&u=f'
<correct rss>
$ curl 'http://206.190.43.214/forecastrss?w=2365345&u=f' -H 'Host: weather.yahooapis.com'
<correct rss>
If you test the two URLs in your browser, you will see the same thing.
So, in your code, you have two choices. You can use the DNS name instead of the IP address:
mdata = urllib2.urlopen('http://weather.yahooapis.com/forecastrss?w='+str(i)+'&u=f')
… or you can use the IP address and add the Host header manually:
req = urllib2.Request('http://206.190.43.214/forecastrss?w='+str(i)+'&u=f')
req.add_header('Host', 'weather.yahooapis.com')
mdata = urllib2.urlopen(req)
There's least one other problem in your code once you fix this. You can't call minidom.parseString(mdata) when mdata is a urlopen thingy; you either need to call read() on the thingy, or use parse instead of parseString.

Related

Python and socket - connet to specific path

I need to connect/send msg to http://localhost:8001/path/to/my/service, but I am not able to find how to do that. I know how to send if I only have localhost and 8001, but I need this specific path /path/to/my/service. There is where my service is running.
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(<full-url-to-my-service>)
s.sendall(bytes('Message', 'utf-8'))
Update
My service is running on localhost:8001/api/v1/namespaces/my_namespace/services/my_service:http/proxy. How can I connect to it with python?
As #furas told in the comments
socket is primitive object and it doesn't have specialized method for this - and you have to on your own create message with correct data. You have to learn HTTP protocol and use it to send
This is a sample snippet to send a GET request in python using requests library
import requests
URL = 'http://localhost:8001/path/to/my/service'
response_text = requests.get(URL).text
print(response_text)
This assumes the Content-Type that GET URL produces is text. If it is json, then a minor change is required
import requests
URL = 'http://localhost:8001/path/to/my/service'
response_json = requests.get(URL).json()
print(response_json)
There are other ways to achieve the same using other good frameworks like urllib, and so on.
Here is the documentation of requests library for reference
sendall() requires bytes, so String must be encoded.
s.sendall("foobar".encode())

Why does the proxy server use my private IP?

I am trying to scrape using proxies (this proxy server is a free one from the internet); in particular I would like to use their IP, not my private one. To test my script I am trying to Access "http://whatismyipaddress.com/" to see which IP this site sees. As it turns out it will see my private IP. Can somebody tell me what's wrong here?
import requests
from fake_useragent import UserAgent
def getMyIP(proxyServer,myPrivateIP):
scrape_website = "http://whatismyipaddress.com/"
ua = UserAgent()
headers = {'User-Agent': ua.random}
try:
response = requests.get(scrape_website,headers=headers,proxies={"https":proxyServer})
except:
faultString = proxyServer + " did not work; " + "\n"
print(faultString)
return
if myPrivateIP in str(response.content):
print("They found my private IP.")
proxyServer = "http://103.250.158.23:61219"
myPrivateIP = "xxx.xxx.xxx.xxx"
getMyIP(proxyServer,myPrivateIP)
Two things:
You set an {'https': ...} proxy configuration. This means for any HTTPS requests, it will use that proxy. You're requesting an HTTP URL however, so that proxy isn't getting used. Configure an 'http' proxy instead or in addition.
If the proxy forwards your IP in an HTTP header, and the target server heeds that header, that's tough luck and nothing you can do anything about, besides using a different proxy which doesn't forward your IP. I think point 1 is more likely the issue though.

Curl Equivalent in Python

I have a python program that takes pictures and I am wondering how I would write a program that sends those pictures to a particular URL.
If it matters, I am running this on a Raspberry Pi.
(Please excuse my simplicity, I am very new to all this)
Many folks turn to the requests library for this sort of thing.
For something lower level, you might use urllib2
I've been using the requests package as well. Here's an example POST from the requests documentation.
If you are feeling that you want to use CURL, try PyCurl.
Install it using:
sudo pip install pycurl
Here is an example of how to send data using it:
import pycurl
import json
import urllib
import cStringIO
url = 'your_url'
first_param = '12'
dArrayData = [{'data' : 'first'}, {'data':'second'}]
json_to_send = json.dumps(dArrayData, separators=(',',':'), sort_keys=False)
curlClient = pycurl.Curl()
curlClient.setopt(curlClient.USERAGENT, 'curl-user-agent')
# Sets the url of the service
curlClient.setopt(curlClient.URL, url)
# Sets the request to be of the type POST
curlClient.setopt(curlClient.POST, True)
# Sets the params of the post request
send_params = 'first_param=' + first_param + '&data=' + urllib.quote(json_to_send)
curlClient.setopt(curlClient.POSTFIELDS, send_params)
# Setting the buffer for the response to be written to
bufResponse = cStringIO.StringIO()
curlClient.setopt(curlClient.WRITEFUNCTION, bufResponse.write)
# Setting to fail on error
curlClient.setopt(curlClient.FAILONERROR, True)
# Sets the time out for the connections
curlClient.setopt(pycurl.CONNECTTIMEOUT, 25)
curlClient.setopt(pycurl.TIMEOUT, 25)
response = ''
try:
# Performs the operation
curlClient.perform()
except pycurl.error as err:
errno, errString = err
print '========'
print 'ERROR sending the data:'
print '========'
print 'CURL error code:', errno
print 'CURL error Message:', errString
else:
response = bufResponse.getvalue()
# Do what ever you want with the response.. Json it or what ever..
finally:
curlClient.close()
bufResponse.close()
The requests library is most supported and advanced way to do this.

How do I get HTTP header info without authentication using python?

I'm trying to write a small program that will simply display the header information of a website. Here is the code:
import urllib2
url = 'http://some.ip.add.ress/'
request = urllib2.Request(url)
try:
html = urllib2.urlopen(request)
except urllib2.URLError, e:
print e.code
else:
print html.info()
If 'some.ip.add.ress' is google.com then the header information is returned without a problem. However if it's an ip address that requires basic authentication before access then it returns a 401. Is there a way to get header (or any other) information without authentication?
I've worked it out.
After try has failed due to unauthorized access the following modification will print the header information:
print e.info()
instead of:
print e.code()
Thanks for looking :)
If you want just the headers, instead of using urllib2, you should go lower level and use httplib
import httplib
conn = httplib.HTTPConnection(host)
conn.request("HEAD", path)
print conn.getresponse().getheaders()
If all you want are HTTP headers then you should make HEAD not GET request. You can see how to do this by reading Python - HEAD request with urllib2.

Http POST Curl in python

I'm having trouble understanding how to issue an HTTP POST request using curl from inside of python.
I'm tying to post to facebook open graph. Here is the example they give which I'd like to replicate exactly in python.
curl -F 'access_token=...' \
-F 'message=Hello, Arjun. I like this new API.' \
https://graph.facebook.com/arjun/feed
Can anyone help me understand this?
You can use httplib to POST with Python or the higher level urllib2
import urllib
params = {}
params['access_token'] = '*****'
params['message'] = 'Hello, Arjun. I like this new API.'
params = urllib.urlencode(params)
f = urllib.urlopen("https://graph.facebook.com/arjun/feed", params)
print f.read()
There is also a Facebook specific higher level library for Python that does all the POST-ing for you.
https://github.com/pythonforfacebook/facebook-sdk/
https://github.com/facebook/python-sdk
Why do you use curl in the first place?
Python has extensive libraries for Facebook and included libraries for web requests, calling another program and receive output is unecessary.
That said,
First from Python Doc
data may be a string specifying additional data to send to the server,
or None if no such data is needed. Currently HTTP requests are the
only ones that use data; the HTTP request will be a POST instead of a
GET when the data parameter is provided. data should be a buffer in
the standard application/x-www-form-urlencoded format. The
urllib.urlencode() function takes a mapping or sequence of 2-tuples
and returns a string in this format. urllib2 module sends HTTP/1.1
requests with Connection:close header included.
So,
import urllib2, urllib
parameters = {}
parameters['token'] = 'sdfsdb23424'
parameters['message'] = 'Hello world'
target = 'http://www.target.net/work'
parameters = urllib.urlencode(parameters)
handler = urllib2.urlopen(target, parameters)
while True:
if handler.code < 400:
print 'done'
# call your job
break
elif handler.code >= 400:
print 'bad request or error'
# failed
break

Categories