The data I am trying to read is in xml format. There is a single space before the xml declaration. I can not edit this part as it is hard coded into the data source. I can only read from it. When the url is entered in IE the data comes up. When entered in Chrome/Firefox, an error is shown but data can be viewed from view source.
Is there a way with python to either strip this space off or ignore it as IE seems to do?
(tried to add strip() in many places)
Or is there a way to default to the page source (I think urlopen does this already)?
Here is the line giving the error:
html = urlopen(address).read()
Here is the error:
Traceback (most recent call last):
File "C:\Users\212311674\Desktop\Python Work\M10url.py", line 27, in <module>
html = urlopen(address).read()
File "C:\Python33\lib\urllib\request.py", line 160, in urlopen
return opener.open(url, data, timeout)
File "C:\Python33\lib\urllib\request.py", line 473, in open
response = self._open(req, data)
File "C:\Python33\lib\urllib\request.py", line 491, in _open
'_open', req)
File "C:\Python33\lib\urllib\request.py", line 451, in _call_chain
result = func(*args)
File "C:\Python33\lib\urllib\request.py", line 1272, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "C:\Python33\lib\urllib\request.py", line 1257, in do_open
r = h.getresponse()
File "C:\Python33\lib\http\client.py", line 1131, in getresponse
response.begin()
File "C:\Python33\lib\http\client.py", line 354, in begin
version, status, reason = self._read_status()
File "C:\Python33\lib\http\client.py", line 336, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: <?xml version="1.0"?><controller_history_cnd>
Related
I'm trying to pull some JSON data from an API using urllib in Python 3.6. It requires header information to be passed for authorization. Here is my code:
import urllib.request, json
headers = {"authorization" : "Bearer {authorization_token}"}
with urllib.request.urlopen("{api_url}", data=headers) as url:
data = json.loads(url.read().decode())
print(data)
And the error message I get:
Traceback (most recent call last):
File "getter.py", line 5, in <module>
with urllib.request.urlopen("{url}", data=headers) as url:
File "AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 526, in open
response = self._open(req, data)
File "AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 544, in _open
'_open', req)
File "AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 1361, in https_open
context=self._context, check_hostname=self._check_hostname)
File "AppData\Local\Programs\Python\Python36-32\lib\urllib\request.py", line 1318, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "AppData\Local\Programs\Python\Python36-32\lib\http\client.py", line 1064, in _send_output
+ b'\r\n'
TypeError: can't concat bytes to str
Process finished with exit code 1
Not too sure what's going wrong here, I'm not inputting any bytes so I'm not sure why I'm getting an error telling me I can't concat bytes to str.
The data argument is expected to be a bytes-like object. you need to do the following:
urllib.request.urlopen({api_url}, data=bytes(json.dumps(headers), encoding="utf-8"))
I'm trying to run a script of an online course and had to change the source code from python 2 to python 3 syntax. In this script, there is a download of an archive, which I already transformed into:
url = "https://www.cs.cmu.edu/~./enron/enron_mail_20150507.tgz"
urllib.request.urlretrieve(url, filename="../enron_mail_20150507.tgz")
However, something seems to be wrong with the URL, since it gives me the following error:
Traceback (most recent call last):
File "/home/xiaolong/development/Python/udacity_intro_to_machine_learning/localpython/lib/python3.5/encodings/idna.py", line 165, in encode
raise UnicodeError("label empty or too long")
UnicodeError: label empty or too long
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "ud120_code_py35fork/tools/startup.py", line 37, in <module>
data = urllib.request.urlretrieve("http://...")
File "/home/xiaolong/development/Python/udacity_intro_to_machine_learning/localpython/lib/python3.5/urllib/request.py", line 187, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/home/xiaolong/development/Python/udacity_intro_to_machine_learning/localpython/lib/python3.5/urllib/request.py", line 162, in urlopen
return opener.open(url, data, timeout)
File "/home/xiaolong/development/Python/udacity_intro_to_machine_learning/localpython/lib/python3.5/urllib/request.py", line 465, in open
response = self._open(req, data)
File "/home/xiaolong/development/Python/udacity_intro_to_machine_learning/localpython/lib/python3.5/urllib/request.py", line 483, in _open
'_open', req)
File "/home/xiaolong/development/Python/udacity_intro_to_machine_learning/localpython/lib/python3.5/urllib/request.py", line 443, in _call_chain
result = func(*args)
File "/home/xiaolong/development/Python/udacity_intro_to_machine_learning/localpython/lib/python3.5/urllib/request.py", line 1268, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/home/xiaolong/development/Python/udacity_intro_to_machine_learning/localpython/lib/python3.5/urllib/request.py", line 1240, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "/home/xiaolong/development/Python/udacity_intro_to_machine_learning/localpython/lib/python3.5/http/client.py", line 1083, in request
self._send_request(method, url, body, headers)
File "/home/xiaolong/development/Python/udacity_intro_to_machine_learning/localpython/lib/python3.5/http/client.py", line 1128, in _send_request
self.endheaders(body)
File "/home/xiaolong/development/Python/udacity_intro_to_machine_learning/localpython/lib/python3.5/http/client.py", line 1079, in endheaders
self._send_output(message_body)
File "/home/xiaolong/development/Python/udacity_intro_to_machine_learning/localpython/lib/python3.5/http/client.py", line 911, in _send_output
self.send(msg)
File "/home/xiaolong/development/Python/udacity_intro_to_machine_learning/localpython/lib/python3.5/http/client.py", line 854, in send
self.connect()
File "/home/xiaolong/development/Python/udacity_intro_to_machine_learning/localpython/lib/python3.5/http/client.py", line 826, in connect
(self.host,self.port), self.timeout, self.source_address)
File "/home/xiaolong/development/Python/udacity_intro_to_machine_learning/localpython/lib/python3.5/socket.py", line 693, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "/home/xiaolong/development/Python/udacity_intro_to_machine_learning/localpython/lib/python3.5/socket.py", line 732, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
UnicodeError: encoding with 'idna' codec failed (UnicodeError: label empty or too long)
I already tried replacing the ~ and the . in the URL with %7E and %2E, but that didn't help at all.
What's wrong with the URL and how can I fix this?
exact python version: 3.5.1
i want to scraping "www.naver.com"
so i tried to scraping using open api
i wrote code following this:
import urllib.request
import urllib.parse
from bs4 import BeautifulSoup
defaultURL = 'http://openapi.naver.com/search?&'
key = 'key=keyvalue'
target='&target=news'
sort='&sort=sim'
start='&start=1'
display='&display=100'
query='&query='+urllib.parse.quote_plus(str(input("write:")))
fullURL=defaultURL+key+target+sort+start+display+query
print(fullURL)
file=open("C:\\Users\\kimty\\Desktop\\k\\python\\N\\naver_news.txt","w",encoding='utf-8')
f=urllib.request.urlopen(fullURL)
resultXML=f.read()
xmlsoup=BeautifulSoup(resultXML,'html.parser')
items=xmlsoup.find._all('item')
for item in items:
file.write('---------------------------------------\n')
file.write('title :'+item.tile.get_text(strip=True)+'\n')
file.write('contents : '+item.description.get_text(strip=True)+'\n')
file.write('\n')
file.close()
but python shell only show this
============= RESTART: C:\Users\kimty\Desktop\kpython\N\N.py =============
write:lee
http://openapi.naver.com/search?&key=keyvalue&target=news&sort=sim&start=1&display=100&query=lee
Traceback (most recent call last):
File "C:\Users\kimty\Desktop\k\python\N\N.py", line 19, in <module>
f=urllib.request.urlopen(fullURL)
File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
return opener.open(url, data, timeout)
File "C:\Python34\lib\urllib\request.py", line 464, in open
response = self._open(req, data)
File "C:\Python34\lib\urllib\request.py", line 482, in _open
'_open', req)
File "C:\Python34\lib\urllib\request.py", line 442, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 1211, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "C:\Python34\lib\urllib\request.py", line 1186, in do_open
r = h.getresponse()
File "C:\Python34\lib\http\client.py", line 1227, in getresponse
response.begin()
File "C:\Python34\lib\http\client.py", line 386, in begin
version, status, reason = self._read_status()
File "C:\Python34\lib\http\client.py", line 356, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: ''
why this happening?
what about that python shell talk to me?
i am using windows 8.1 64x, python 3.4.4
This http.client.BadStatusLine is a subclass of http.client.HTTPException. It gave you a http error back, maybe your API key is wrong! If I try to access the link with my browser it also gives me an error.
This is the exact address you tried to request.
Edit
Some people have fixed this error by importing the http lib.
I am trying Google's Native Client SDK.
OS is Windows 7, I've already installed python 2.7.9 and setup the environment variable path accordingly.
I also downloaded nacl_sdk.zip from https://developer.chrome.com/native-client/sdk/download and extracted it.
However, as I run the command "naclsdk list" as it is described on the download page, I got the following messages:
C:\Temp\nacl_sdk>naclsdk list
Traceback (most recent call last):
File "C:\Temp\nacl_sdk\sdk_tools\sdk_update_main.py", line 759, in
sys.exit(main(sys.argv[1:]))
File "C:\Temp\nacl_sdk\sdk_tools\sdk_update_main.py", line 752, in main
InvokeCommand(args)
File "C:\Temp\nacl_sdk\sdk_tools\sdk_update_main.py", line 741, in InvokeCommand
command(options, args[1:], config)
File "C:\Temp\nacl_sdk\sdk_tools\sdk_update_main.py", line 510, in Update
manifest = LoadManifestFromURLs([options.manifest_url] + config.GetSources())
File "C:\Temp\nacl_sdk\sdk_tools\sdk_update_main.py", line 238, in LoadManifestFromURLs
url_stream = UrlOpen(url)
File "C:\Temp\nacl_sdk\sdk_tools\sdk_update_main.py", line 114, in UrlOpen
return url_opener.open(request)
File "C:\python27\lib\urllib2.py", line 431, in open
response = self._open(req, data)
File "C:\python27\lib\urllib2.py", line 449, in _open
'_open', req)
File "C:\python27\lib\urllib2.py", line 409, in _call_chain
result = func(*args)
File "C:\python27\lib\urllib2.py", line 1240, in https_open
context=self._context)
TypeError: do_open() got an unexpected keyword argument 'context'
Traceback (most recent call last):
File "C:\Temp\nacl_sdk\sdk_tools\sdk_update_main.py", line 759, in
sys.exit(main(sys.argv[1:]))
File "C:\Temp\nacl_sdk\sdk_tools\sdk_update_main.py", line 752, in main
InvokeCommand(args)
File "C:\Temp\nacl_sdk\sdk_tools\sdk_update_main.py", line 741, in InvokeCommand
command(options, args[1:], config)
File "C:\Temp\nacl_sdk\sdk_tools\sdk_update_main.py", line 444, in List
manifest = LoadManifestFromURLs([options.manifest_url] + config.GetSources())
File "C:\Temp\nacl_sdk\sdk_tools\sdk_update_main.py", line 238, in LoadManifestFromURLs
url_stream = UrlOpen(url)
File "C:\Temp\nacl_sdk\sdk_tools\sdk_update_main.py", line 114, in UrlOpen
return url_opener.open(request)
File "C:\python27\lib\urllib2.py", line 431, in open
response = self._open(req, data)
File "C:\python27\lib\urllib2.py", line 449, in _open
'_open', req)
File "C:\python27\lib\urllib2.py", line 409, in _call_chain
result = func(*args)
File "C:\python27\lib\urllib2.py", line 1240, in https_open
context=self._context)
TypeError: do_open() got an unexpected keyword argument 'context'
In fact, no matter what command I use, it doesn't seem to work at all.
Does anyone know the solution?
having the same issue.
Found this link: http://forums.udacity.com/questions/100247273/urllib2-may-be-a-bug-to-find-the-appropriate-method-overload
Solved by editing python27/lib/urllib2.py...
Hope it helps!
For the openers, opener = urllib2.build_opener(), if I try to add an header:
request.add_header('if-modified-since',request.headers.get('last-nodified'))
I get the error code:
Traceback (most recent call last):
File "<pyshell#19>", line 1, in <module>
feeddata = opener.open(request)
File "C:\Python27\lib\urllib2.py", line 391, in open
response = self._open(req, data)
File "C:\Python27\lib\urllib2.py", line 409, in _open
'_open', req)
File "C:\Python27\lib\urllib2.py", line 369, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 1173, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "C:\Python27\lib\urllib2.py", line 1142, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "C:\Python27\lib\httplib.py", line 946, in request
self._send_request(method, url, body, headers)
File "C:\Python27\lib\httplib.py", line 986, in _send_request
self.putheader(hdr, value)
File "C:\Python27\lib\httplib.py", line 924, in putheader
str = '%s: %s' % (header, '\r\n\t'.join(values))
TypeError: sequence item 0: expected string, NoneType found
How do you get around this?
I tried building a class from urllib2.BaseHandler and it doesn't work.
Your traceback says: expected string, NoneType found from which I deduce that you've stored a None value as a header. Did you really write 'last-nodified'? The header you mean was probably 'last-modified', but even then you should check that it existed and not re-use it as a header if request.headers.get() returns None.