I am trying to pull information from a site ever 5 seconds but it doesn't seem to be working and I get errors every time I run it.
Code below:
import urllib2, threading
def readpage():
data = urllib2.urlopen('http://forums.zybez.net/runescape-2007-prices').read()
for line in data:
if 'forums.zybez.net/runescape-2007-prices/player/' in line:
a = line.split('/runescape-2007-prices/player/'[1])
print(a.split('">')[0])
t = threading.Timer(5.0, readpage)
t.start()
I get these errors:
Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Python27\lib\threading.py", line 808, in __bootstrap_inner
self.run()
File "C:\Python27\lib\threading.py", line 1080, in run
self.function(*self.args, **self.kwargs)
File "C:\Users\Jordan\Desktop\username.py", line 3, in readpage
data = urllib2.urlopen('http://forums.zybez.net/runescape-2007-prices').rea
()
File "C:\Python27\lib\urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 410, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 448, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden
Help would be appreciated, thanks!
The site is rejecting the default User-Agent reported by urllib2. You can change it for all requests in the script using install_opener.
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0')]
urllib2.install_opener(opener)
You'll also need to split the data from by the site to read it line by line
urllib2.urlopen('http://forums.zybez.net/runescape-2007-prices').read().splitlines()
and change
line.split('/runescape-2007-prices/player/'[1])
to
line.split('/runescape-2007-prices/player/')[1]
Working:
import urllib2, threading
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0')]
urllib2.install_opener(opener)
def readpage():
data = urllib2.urlopen('http://forums.zybez.net/runescape-2007-prices').read().splitlines()
for line in data:
if 'forums.zybez.net/runescape-2007-prices/player/' in line:
a = line.split('/runescape-2007-prices/player/')[1]
print(a.split('">')[0])
t = threading.Timer(5.0, readpage)
t.start()
Did you try opening that url without the thread? The error code says 403: Forbidden, maybe you need authentication for that web page.
This has nothing to do with Python -- the server is denying your requests to that URL.
I suspect that either the URL is incorrect or you've hit some kind of rate limiting and are being blocked.
EDIT: how to make it work
The site is blocking Python's User-Agent. Try this:
import urllib2, threading
def readpage():
headers = { 'User-Agent' : 'Mozilla/5.0' }
req = urllib2.Request('http://forums.zybez.net/runescape-2007-prices', None, headers)
data = urllib2.urlopen(req).read()
for line in data:
if 'forums.zybez.net/runescape-2007-prices/player/' in line:
a = line.split('/runescape-2007-prices/player/'[1])
print(a.split('">')[0])
Related
I am trying to make some image filters for my API. Some URLs work while most do not. I wanted to know why and how to fix it. I have Looked through another stack overflow post but have not had much luck as I don't know the problem.
Here is an example of a working URL
And one that does not work
Edit Here is another URL that does not work
Here is the API I am trying to make
Here is my code
def generate_image_Wanted(imageUrl):
with urllib.request.urlopen(imageUrl) as url:
f = io.BytesIO(url.read())
im1 = Image.open("images/wanted.jpg")
im2 = Image.open(f)
im2 = im2.resize((300, 285))
img = im1.copy()
img.paste(im2, (85, 230))
d = BytesIO()
d.seek(0)
img.save(d, "PNG")
d.seek(0)
return d
Here is my error
Traceback (most recent call last):
File "c:\Users\micha\OneDrive\Desktop\MicsAPI\test.py", line 23, in <module>
generate_image_Wanted("https://cdn.discordapp.com/avatars/902240397273743361/9d7ce93e7510f47da2d8ba97ec32fc33.png")
File "c:\Users\micha\OneDrive\Desktop\MicsAPI\test.py", line 11, in generate_image_Wanted
with urllib.request.urlopen(imageUrl) as url:
File "C:\Users\micha\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\micha\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 523, in open
response = meth(req, response)
File "C:\Users\micha\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 632, in http_response
response = self.parent.error(
File "C:\Users\micha\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 561, in error
return self._call_chain(*args)
File "C:\Users\micha\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 494, in _call_chain
result = func(*args)
File "C:\Users\micha\AppData\Local\Programs\Python\Python39\lib\urllib\request.py", line 641, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Thank you for looking at this and have a good day.
maybe sites you can't scrape has server prevention for known bot and spiders and block your request from urllib.
You need to provide some headers - see more about python request lib
Working example:
import urllib.request
hdr = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)' }
url = "https://cdn.discordapp.com/avatars/902240397273743361/9d7ce93e7510f47da2d8ba97ec32fc33.png"
req = urllib.request.Request(url, headers=hdr)
response = urllib.request.urlopen(req)
response.read()
This code works well on my local machine, but when I upload and run it on pythonanywhere.com it gives me this error.
My Code:
url = "http://www.codeforces.com/api/contest.list?gym=false"
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}
req = urllib2.Request(url, headers=hdr)
opener = urllib2.build_opener()
openedReq = opener.open(req, timeout=300)
The error:
Traceback (most recent call last):
File "/home/GehadAbdallah/main.py", line 135, in openApi
openedReq = opener.open(req, timeout=300)
File "/usr/lib/python2.7/urllib2.py", line 410, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 448, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
P.S. i'm working on python 2.7
Free accounts on PythonAnywhere are restricted to a whitelist of sites, http/https only, and access goes via a proxy. There's more info here:
PythonAnywhere wiki: "why do I get a 403 forbidden error when opening a url?"
I recently used urllib2 with a flask project on pythonanywhere using their free account to access an api at donorschoose.org
This might be helpful,
#app.route('/funding')
def fundingByState():
urllib2.install_opener(urllib2.build_opener(urllib2.ProxyHandler({'http': 'proxy.server:3128'})))
donors_choose_url = "http://api.donorschoose.org/common/json_feed.html?historical=true&APIKey=DONORSCHOOSE"
response = urllib2.urlopen(donors_choose_url)
json_response = json.load(response)
return json.dumps(json_response)
This does work.
If you're using paid account but still get this error message
try this pythonanywhere_forums
To me, I have to delete the console then restart a new one.
This code works well on my local machine, but when I upload and run it on pythonanywhere.com it gives me this error.
My Code:
url = "http://www.codeforces.com/api/contest.list?gym=false"
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}
req = urllib2.Request(url, headers=hdr)
opener = urllib2.build_opener()
openedReq = opener.open(req, timeout=300)
The error:
Traceback (most recent call last):
File "/home/GehadAbdallah/main.py", line 135, in openApi
openedReq = opener.open(req, timeout=300)
File "/usr/lib/python2.7/urllib2.py", line 410, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 448, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
P.S. i'm working on python 2.7
Free accounts on PythonAnywhere are restricted to a whitelist of sites, http/https only, and access goes via a proxy. There's more info here:
PythonAnywhere wiki: "why do I get a 403 forbidden error when opening a url?"
I recently used urllib2 with a flask project on pythonanywhere using their free account to access an api at donorschoose.org
This might be helpful,
#app.route('/funding')
def fundingByState():
urllib2.install_opener(urllib2.build_opener(urllib2.ProxyHandler({'http': 'proxy.server:3128'})))
donors_choose_url = "http://api.donorschoose.org/common/json_feed.html?historical=true&APIKey=DONORSCHOOSE"
response = urllib2.urlopen(donors_choose_url)
json_response = json.load(response)
return json.dumps(json_response)
This does work.
If you're using paid account but still get this error message
try this pythonanywhere_forums
To me, I have to delete the console then restart a new one.
I have a very basic script to download a website using Python urllib2.
This has been working brilliantly for the past 6 months, and then this morning it no longer works?
#!/usr/bin/python
import urllib2
proxy_support = urllib2.ProxyHandler({'http': 'http://DOMAIN\USER:PASS#PROXY:PORT/'})
opener = urllib2.build_opener(proxy_support)
urllib2.install_opener(opener)
translink = open('/tmp/trains.html' ,'w')
response = urllib2.urlopen('http://translink.com.au')
html = response.read()
translink.write(html)
translink.close()
I am now getting the following error
Traceback (most recent call last):
File "./gettrains.py", line 7, in <module>
response = urllib2.urlopen('http://translink.com.au')
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 407, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 520, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 445, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 379, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 528, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 502: Proxy Error ( The HTTP message includes an unsupported header or an unsupported combination of headers. )
I am new to Python, any help would be very much appreciated.
Cheers
#!/usr/bin/python
import requests
proxies = {
"http": "http://domain\user:pass#proxy:port",
"https": "http:// domain\user:pass#proxy:port",
}
html = requests.get("http://translink.com.au", proxies=proxies)
translink = open('/tmp/trains.html' ,'w')
translink.write(html.content)
translink.close()
Try to change a header. For example:
opener = urllib2.build_opener(proxy_support)
opener.addheaders = ([('User-Agent' , 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0)')])
urllib2.install_opener(opener)
I had same problem some days ago. My proxy didn't admit the default header user-agent='Python-urllib/2.7'
To simplify things a little bit, I would avoid the proxy setup from within python and simply let your OS manage it for you. You can do this by setting an environment variable (like export http_proxy="your_proxy" in Linux). Then simply grab the file directly through python, which you can do with urllib2 or requests, you may also consider the wget module.
It's totally possible that there may have been some changes to your proxy that forwards the requests with headers that are no longer acceptable by your final destination. In that case there's very little you can do.
I have some code which is very similar to code used here:
https://github.com/jeysonmc/python-google-speech-scripts/blob/master/stt_google.py
Here is my code:
f = open(filename, 'rb')
speech = f.read()
f.close()
LANG_CODE = 'en-US' # Language to use
GOOGLE_SPEECH_URL = 'https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&pfilter=2&lang=%s&maxresults=6' % (LANG_CODE)
f = open(filename, 'rb')
flac_cont = f.read()
f.close()
hrs = {"User-Agent": "Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.63 Safari/535.7",
'Content-type': 'audio/x-flac; rate=16000'}
req = urllib2.Request(GOOGLE_SPEECH_URL, data=flac_cont, headers=hrs)
print "Sending request to Google TTS"
p = urllib2.urlopen(req)
response = p.read()
print "response", response
res = eval(response)['hypotheses']
It seems to get stuck on the urllib2.urlopen(req) line. It gives back this error:
Traceback (most recent call last):
File "google-speech.py", line 443, in <module>
GoogleSpeech.text_from_speech(filename)
File "google-speech.py", line 274, in text_from_speech
p = urllib2.urlopen(req)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 410, in open
response = meth(req, response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 448, in error
return self._call_chain(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 400: Bad Request
I'm not sure what the issue could be
EDIT: Added the end of my backtrace, which was missing earlier
If the error happens randomly, you can use a graceful retry algorithm, such as the one implemented here:
https://wiki.python.org/moin/PythonDecoratorLibrary#Retry
The idea is that, if for example the URL is currently not reachable, you don't keep retrying blindly, but increase the retry interval to allow the target location to recover, and backoff eventually if the URL cannot be opened at all.
If the error happens everytime, you have a different problem and should post the complete stacktrace.
This is what I do to overcome this problem:
while True:
try:
p = urllib2.urlopen(req)
break
except Exception as e:
print(e, 'Trying again...')