I am following Bokeh's User Guide.
In "Embedding Bokeh Server as a Library" at http://docs.bokeh.org/en/latest/docs/user_guide/server.html#embedding-bokeh-server-as-a-library
it refers to a demo where a Bokeh server is embedded in Flask (https://github.com/bokeh/bokeh/blob/0.12.6/examples/howto/server_embed/flask_embed.py)
It should be straightforward but I get a Tornado error if launched with python flask_embed.py. Anybody has an idea WHY?
The page on the browser is correctly launched but there is no plot.
This is the short error message:
ERROR:tornado.application:Uncaught exception GET /bkapp/autoload.js?bokeh-autoload-element=3a711948-3668-4f63-8d0c-8cd1584fb92d&bokeh-app-path=/bkapp&bokeh-absolute-url=http://localhost:5006/bkapp (127.0.0.1)
HTTPServerRequest(protocol='http', host='localhost:5006', method='GET', uri='/bkapp/autoload.js?bokeh-autoload-element=3a711948-3668-4f63-8d0c-8cd1584fb92d&bokeh-app-path=/bkapp&bokeh-absolute-url=http://localhost:5006/bkapp', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Accept-Language': 'en-US,en;q=0.5', 'Accept-Encoding': 'gzip, deflate', 'Host': 'localhost:5006', 'Accept': '*/*', 'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0', 'Connection': 'keep-alive', 'Referer': 'http://localhost:8080/', 'Cookie': 'username-localhost-8888="2|1:0|10:1501067928|23:username-localhost-8888|44:Y2EwOTUzN2YzNWRiNGQyMDgxZWEyOGMzZDJkOTI4ZWY=|f4f981dd915dc777c70e605b7135bcbbc076b3fe3482999e5ca557cb4abd518e"; _xsrf=2|c711b8e7|f913ccc5c9cc32532c1e67bbd75b6051|1500889250'})
...
HTTPError: HTTP Error 400: Bad Request
ERROR:tornado.access:500 GET /bkapp/autoload.js?bokeh-autoload-element=3a711948-3668-4f63-8d0c-8cd1584fb92d&bokeh-app-path=/bkapp&bokeh-absolute-url=http://localhost:5006/bkapp (127.0.0.1)
And here the whole traceback:
Opening Flask app with embedded Bokeh application on http://localhost:8080/
ERROR:tornado.application:Uncaught exception GET /bkapp/autoload.js?bokeh-autoload-element=3a711948-3668-4f63-8d0c-8cd1584fb92d&bokeh-app-path=/bkapp&bokeh-absolute-url=http://localhost:5006/bkapp (127.0.0.1)
HTTPServerRequest(protocol='http', host='localhost:5006', method='GET', uri='/bkapp/autoload.js?bokeh-autoload-element=3a711948-3668-4f63-8d0c-8cd1584fb92d&bokeh-app-path=/bkapp&bokeh-absolute-url=http://localhost:5006/bkapp', version='HTTP/1.1', remote_ip='127.0.0.1', headers={'Accept-Language': 'en-US,en;q=0.5', 'Accept-Encoding': 'gzip, deflate', 'Host': 'localhost:5006', 'Accept': '*/*', 'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0', 'Connection': 'keep-alive', 'Referer': 'http://localhost:8080/', 'Cookie': 'username-localhost-8888="2|1:0|10:1501067928|23:username-localhost-8888|44:Y2EwOTUzN2YzNWRiNGQyMDgxZWEyOGMzZDJkOTI4ZWY=|f4f981dd915dc777c70e605b7135bcbbc076b3fe3482999e5ca557cb4abd518e"; _xsrf=2|c711b8e7|f913ccc5c9cc32532c1e67bbd75b6051|1500889250'})
Traceback (most recent call last):
File "/home/alessandro/git-files/python/study_graph2/env/local/lib/python2.7/site-packages/tornado/web.py", line 1511, in _execute
result = yield result
File "/home/alessandro/git-files/python/study_graph2/env/local/lib/python2.7/site-packages/tornado/gen.py", line 1055, in run
value = future.result()
File "/home/alessandro/git-files/python/study_graph2/env/local/lib/python2.7/site-packages/tornado/concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
File "/home/alessandro/git-files/python/study_graph2/env/local/lib/python2.7/site-packages/tornado/gen.py", line 1063, in run
yielded = self.gen.throw(*exc_info)
File "/home/alessandro/git-files/python/study_graph2/env/local/lib/python2.7/site-packages/bokeh/server/views/autoload_js_handler.py", line 31, in get
session = yield self.get_session()
File "/home/alessandro/git-files/python/study_graph2/env/local/lib/python2.7/site-packages/tornado/gen.py", line 1055, in run
value = future.result()
File "/home/alessandro/git-files/python/study_graph2/env/local/lib/python2.7/site-packages/tornado/concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
File "/home/alessandro/git-files/python/study_graph2/env/local/lib/python2.7/site-packages/tornado/gen.py", line 1063, in run
yielded = self.gen.throw(*exc_info)
File "/home/alessandro/git-files/python/study_graph2/env/local/lib/python2.7/site-packages/bokeh/server/views/session_handler.py", line 40, in get_session
session = yield self.application_context.create_session_if_needed(session_id, self.request)
File "/home/alessandro/git-files/python/study_graph2/env/local/lib/python2.7/site-packages/tornado/gen.py", line 1055, in run
value = future.result()
File "/home/alessandro/git-files/python/study_graph2/env/local/lib/python2.7/site-packages/tornado/concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
File "/home/alessandro/git-files/python/study_graph2/env/local/lib/python2.7/site-packages/tornado/gen.py", line 1069, in run
yielded = self.gen.send(value)
File "/home/alessandro/git-files/python/study_graph2/env/local/lib/python2.7/site-packages/bokeh/server/application_context.py", line 177, in create_session_if_needed
self._application.initialize_document(doc)
File "/home/alessandro/git-files/python/study_graph2/env/local/lib/python2.7/site-packages/bokeh/application/application.py", line 121, in initialize_document
h.modify_document(doc)
File "/home/alessandro/git-files/python/study_graph2/env/local/lib/python2.7/site-packages/bokeh/application/handlers/function.py", line 16, in modify_document
self._func(doc)
File "main.py", line 22, in modify_doc
df = pd.read_csv(data_url, parse_dates=True, index_col=0)
File "/home/alessandro/git-files/python/study_graph2/env/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 655, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/alessandro/git-files/python/study_graph2/env/local/lib/python2.7/site-packages/pandas/io/parsers.py", line 392, in _read
filepath_or_buffer, encoding, compression)
File "/home/alessandro/git-files/python/study_graph2/env/local/lib/python2.7/site-packages/pandas/io/common.py", line 186, in get_filepath_or_buffer
req = _urlopen(url)
File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 435, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 473, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 400: Bad Request
ERROR:tornado.access:500 GET /bkapp/autoload.js?bokeh-autoload-element=3a711948-3668-4f63-8d0c-8cd1584fb92d&bokeh-app-path=/bkapp&bokeh-absolute-url=http://localhost:5006/bkapp (127.0.0.1) 425.75ms
When the page is served, the server tries to load CSV data from an external URL using Pandas. I'm not sure whether this example worked before, but right now it seems that pd.read_csv does not encode URL query, so the server is unable to treat characters > and <. You can either replace the characters manually (refer to https://en.wikipedia.org/wiki/Percent-encoding) or use some library for it, like Python's urllib.
Related
Why when I call a website with curl it works, but when call with python always return 429? I tried to set a lot different user-agent, cookies...
curl call:
curl "https://query2.finance.yahoo.com/v10/finance/quoteSummary/GLW?formatted=true&crumb=8ldhetOu7RJ&lang=en-US®ion=US&modules=summaryDetail&corsDomain=finance.yahoo.com"
response: {"quoteSummary":{"result":[{"summaryDetail":{"maxAge":1,"priceHint":{"raw":2,"fmt":"2","longFmt":"2"},"previousClose":{"raw":37.12,"fmt":"37.12"},"open":{"raw":37.19,"fmt":"37.19"},"dayLow":{"raw":37.12,"fmt":"37.12"},"dayHigh":{"raw":37.95,"fmt":"37.95"},"regularMarketPreviousClose":{"raw":37.12,"fmt":"37.12"},"regularMarketOpen":{"raw":37.19,"fmt":"37.19"},"regularMarketDayLow":{"raw":37.12,"fmt":"37.12"},"regularMarketDayHigh":{"raw":37.95,"fmt":"37.95"},"dividendRate":{"raw":0.88,"fmt":"0.88"},"dividendYield":{"raw":0.0232,"fmt":"2.32%"},"exDividendDate":{"raw":1605139200,"fmt":"2020-11-12"},"payoutRatio":{"raw":3.3077,"fmt":"330.77%"},"fiveYearAvgDividendYield":{"raw":2.43,"fmt":"2.43"},"beta":{"raw":1.173753,"fmt":"1.17"},"trailingPE":{"raw":148.82353,"fmt":"148.82"},"forwardPE":{"raw":20.294119,"fmt":"20.29"},"volume":{"raw":3372416,"fmt":"3.37M","longFmt":"3,372,416"},"regularMarketVolume":{"raw":3372416,"fmt":"3.37M","longFmt":"3,372,416"},"averageVolume":{"raw":4245485,"fmt":"4.25M","longFmt":"4,245,485"},"averageVolume10days":{"raw":3351485,"fmt":"3.35M","longFmt":"3,351,485"},"averageDailyVolume10Day":{"raw":3351485,"fmt":"3.35M","longFmt":"3,351,485"},"bid":{"raw":37.88,"fmt":"37.88"},"ask":{"raw":37.89,"fmt":"37.89"},"bidSize":{"raw":1100,"fmt":"1.1k","longFmt":"1,100"},"askSize":{"raw":800,"fmt":"800","longFmt":"800"},"marketCap":{"raw":28994179072,"fmt":"28.99B","longFmt":"28,994,179,072"},"yield":{},"ytdReturn":{},"totalAssets":{},"expireDate":{},"strikePrice":{},"openInterest":{},"fiftyTwoWeekLow":{"raw":17.44,"fmt":"17.44"},"fiftyTwoWeekHigh":{"raw":37.95,"fmt":"37.95"},"priceToSalesTrailing12Months":{"raw":2.6921244,"fmt":"2.69"},"fiftyDayAverage":{"raw":35.406857,"fmt":"35.41"},"twoHundredDayAverage":{"raw":31.052786,"fmt":"31.05"},"trailingAnnualDividendRate":{"raw":0.86,"fmt":"0.86"},"trailingAnnualDividendYield":{"raw":0.023168104,"fmt":"2.32%"},"navPrice":{},"currency":"USD","fromCurrency":null,"toCurrency":null,"lastMarket":null,"volume24Hr":{},"volumeAllCurrencies":{},"circulatingSupply":{},"algorithm":null,"maxSupply":{},"startDate":{},"tradeable":false}}],"error":null}}
with python:
import requests
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'}
result = requests.get('https://query2.finance.yahoo.com/v10/finance/quoteSummary/GLW?formatted=true&crumb=8ldhetOu7RJ&lang=en-US®ion=US&modules=summaryDetail&corsDomain=finance.yahoo.com', headers=headers)
print result.content
response:
Traceback (most recent call last):
File "a.py", line 35, in <module>
response = urllib.request.urlopen(req, jsondataasbytes)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 429: Too Many Requests
Ok, ok, solved by passing get parameters via params:
import requests
payload = {"modules": "summaryDetail"}
response = requests.get("https://query2.finance.yahoo.com/v10/finance/quoteSummary/GLW", params=payload)
print(response.json())
My code is supposed to get from specific json like url (output webpage is offering is not JSON which is required). When I get it with connection A it returns me following error:
Traceback (most recent call last):
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\threading.py", line 914, in _bootstrap_inner
self.run()
File "G:/Internship/quantsol-text/web-crawler/mynet_new/date_gaining.py", line 20, in run
main_func(self.counter)
File "G:/Internship/quantsol-text/web-crawler/mynet_new/date_gaining.py", line 166, in main_func
total=url_to_dict(url)
File "G:/Internship/quantsol-text/web-crawler/mynet_new/date_gaining.py", line 79, in url_to_dict
data = urllib.request.urlopen(url).read().decode('utf-8')
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 472, in open
response = meth(req, response)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 510, in error
return self._call_chain(*args)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 444, in _call_chain
result = func(*args)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
Interestingly, when I try to get info with connection B it works fine however i get following error after 10000-20000 iterations:
Exception in thread Thread-9:
Traceback (most recent call last):
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 1254, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 1106, in request
self._send_request(method, url, body, headers)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 1151, in _send_request
self.endheaders(body)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 1102, in endheaders
self._send_output(message_body)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 934, in _send_output
self.send(msg)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 877, in send
self.connect()
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 849, in connect
(self.host,self.port), self.timeout, self.source_address)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\socket.py", line 711, in create_connection
raise err
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\socket.py", line 702, in create_connection
sock.connect(sa)
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
I searched several hours internet for error with connection B Error with connection B mainly occurs because of connection problem or proxy. I tried this solution with several different proxies it did not work either gave the same error after some thousand iterations:
proxy_support = urllib.request.ProxyHandler({"http": "http://208.83.106.105:9999"})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
The problematic part is following :
class myThread (threading.Thread):
def __init__(self, threadID, name, counter):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.counter = counter
def run(self):
main_func(self.counter)
def url_to_dict(url):
hdr = {
'User-Agent': 'Chrome/60.0.3112.101 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Safari/537.11 Mozilla/55.0.2',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}
data2= urllib.request.Request(url,headers= {'User-Agent': 'Mozilla/5.0'})
# proxy_support = urllib.request.ProxyHandler({"http": "http://61.233.25.166:80"})
# opener = urllib2.build_opener(proxy_support)
# urllib2.install_opener(opener)
data = urllib.request.urlopen(url).read().decode('utf-8')
json_type_string = re.findall('({.*})', data)[0]
json_data = json.loads(json_type_string)
total_page = json_data['data']['totalPage']
return json_data,total_page
def main_func(counter):
proxy_support = urllib.request.ProxyHandler({"http": "http://208.83.106.105:9999"})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
for x in range(len(url_list)):
url=url_list[x]
company_name=company_list[x]
total=url_to_dict(url)
total_page=total[1]
for y in range(int(total_page/10)):
index = url.find('config[page]=')
index2 = url.find('&config[reply')
k = y*10
url = url[:index+13] + str(counter+k) + url[index2:]
print(url)
data = url_to_dict(url)
parsed_data = get_data(data)
add_to_mongo(parsed_data,company_name)
What can I do to fix this problem? Also, what is cause of getting Error 404 not found?
Thanks in advance
It's not the answer (still can't commenting) but did you try the 'requests' library? I guess it's more powerfull and newest, so ..
My code is supposed to get from specific json like url (output webpage is offering is not JSON which is required). When I get it with connection A it returns me following error:
Traceback (most recent call last):
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\threading.py", line 914, in _bootstrap_inner
self.run()
File "G:/Internship/quantsol-text/web-crawler/mynet_new/date_gaining.py", line 20, in run
main_func(self.counter)
File "G:/Internship/quantsol-text/web-crawler/mynet_new/date_gaining.py", line 166, in main_func
total=url_to_dict(url)
File "G:/Internship/quantsol-text/web-crawler/mynet_new/date_gaining.py", line 79, in url_to_dict
data = urllib.request.urlopen(url).read().decode('utf-8')
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 472, in open
response = meth(req, response)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 510, in error
return self._call_chain(*args)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 444, in _call_chain
result = func(*args)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
Interestingly, when I try to get info with connection B it works fine however i get following error after 10000-20000 iterations:
Exception in thread Thread-9:
Traceback (most recent call last):
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\urllib\request.py", line 1254, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 1106, in request
self._send_request(method, url, body, headers)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 1151, in _send_request
self.endheaders(body)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 1102, in endheaders
self._send_output(message_body)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 934, in _send_output
self.send(msg)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 877, in send
self.connect()
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\http\client.py", line 849, in connect
(self.host,self.port), self.timeout, self.source_address)
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\socket.py", line 711, in create_connection
raise err
File "C:\Users\nihadazimli\AppData\Local\Programs\Python\Python35\lib\socket.py", line 702, in create_connection
sock.connect(sa)
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
I searched several hours internet for error with connection B Error with connection B mainly occurs because of connection problem or proxy. I tried this solution with several different proxies it did not work either gave the same error after some thousand iterations:
proxy_support = urllib.request.ProxyHandler({"http": "http://208.83.106.105:9999"})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
The problematic part is following :
class myThread (threading.Thread):
def __init__(self, threadID, name, counter):
threading.Thread.__init__(self)
self.threadID = threadID
self.name = name
self.counter = counter
def run(self):
main_func(self.counter)
def url_to_dict(url):
hdr = {
'User-Agent': 'Chrome/60.0.3112.101 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Safari/537.11 Mozilla/55.0.2',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}
data2= urllib.request.Request(url,headers= {'User-Agent': 'Mozilla/5.0'})
# proxy_support = urllib.request.ProxyHandler({"http": "http://61.233.25.166:80"})
# opener = urllib2.build_opener(proxy_support)
# urllib2.install_opener(opener)
data = urllib.request.urlopen(url).read().decode('utf-8')
json_type_string = re.findall('({.*})', data)[0]
json_data = json.loads(json_type_string)
total_page = json_data['data']['totalPage']
return json_data,total_page
def main_func(counter):
proxy_support = urllib.request.ProxyHandler({"http": "http://208.83.106.105:9999"})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
for x in range(len(url_list)):
url=url_list[x]
company_name=company_list[x]
total=url_to_dict(url)
total_page=total[1]
for y in range(int(total_page/10)):
index = url.find('config[page]=')
index2 = url.find('&config[reply')
k = y*10
url = url[:index+13] + str(counter+k) + url[index2:]
print(url)
data = url_to_dict(url)
parsed_data = get_data(data)
add_to_mongo(parsed_data,company_name)
import urllib2
def GetBrowserHtml_content(url):
req_header = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
'Accept':'text/html;q=0.9,*/*;q=0.8',
'Accept-Charset':'ISO-8859-1,utf-8,gbk;q=0.7,*;q=0.3',
'Connection':'close',
'Referer':None
}
req_timeout = 5
request = urllib2.Request(url,None,req_header)
response = urllib2.urlopen(request,None,req_timeout)
html_content = response.read()
return html_content
url = 'http://www.ccdi.gov.cn/jlsc/index_4.html'
html_content = GetBrowserHtml_content(url)
I have a piece of code like above.
And when I run the code,I get the following error.
Traceback (most recent call last):
File "E:/Programming/python/CWSeg/spider/hahahha.py", line 34, in <module>
html_content = GetBrowserHtml_content(url)
File "E:/Programming/python/CWSeg/spider/hahahha.py", line 22, in GetBrowserHtml_content
response = urllib2.urlopen(request,None,req_timeout)
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 406, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 444, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 378, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 527, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 521:
Can anyone point out what im doing wrong? Thanks in advance.
The following code:
req = urllib.request.Request(url=r"http://borel.slu.edu/cgi-bin/cc.cgi?foirm_ionchur=im&foirm=Seol&hits=1&format=xml",headers={'User-Agent':' Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0'})
handler = urllib.request.urlopen(req)
is giving me the following exception:
Traceback (most recent call last):
File "C:/Users/Foo/lang/old/test.py", line 46, in <module>
rip()
File "C:/Users/Foo/lang/old/test.py", line 36, in rip
handler = urllib.request.urlopen(req)
File "C:\Python32\lib\urllib\request.py", line 138, in urlopen
return opener.open(url, data, timeout)
File "C:\Python32\lib\urllib\request.py", line 375, in open
response = meth(req, response)
File "C:\Python32\lib\urllib\request.py", line 487, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python32\lib\urllib\request.py", line 413, in error
return self._call_chain(*args)
File "C:\Python32\lib\urllib\request.py", line 347, in _call_chain
result = func(*args)
File "C:\Python32\lib\urllib\request.py", line 495, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 500: Internal Server Error
but it works fine in my browser, whats the issue?
The server is rather b0rken. It responds with a 500 error in the browser as well.
You can catch the exception and still read the response:
import urllib.request
from urllib.error import HTTPError
req = urllib.request.Request(url=r"http://borel.slu.edu/cgi-bin/cc.cgi?foirm_ionchur=im&foirm=Seol&hits=1&format=xml",headers={'User-Agent':' Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0'})
try:
handler = urllib.request.urlopen(req)
except HTTPError as e:
content = e.read()
When it happened to me I've reduced the plt.figure size parameter and it worked. It may be some odd parameter on your code that is not being able to be read.