# -*- coding: UTF-8 -*-
import urllib.request
import re
import os
os.system("cls")
url=input("Url Link : ")
if(url[0:8]=="https://"):
url=url[:4]+url[5:]
if(url[0:7]!="http://"):
url="http://"+url
try :
try :
value=urllib.request.urlopen(url,timeout=60).read().decode('cp949')
except UnicodeDecodeError :
value=urllib.request.urlopen(url,timeout=60).read().decode('UTF8')
par='<title>(.+?)</title>'
result=re.findall(par,value)
print(result)
except ConnectionResetError as e:
print(e)
TimeoutError is disappeared. But ConnectionResetError appear. What is this Error? Is it server problem? So it can't solve with me?
포기하지 마세요! Don't give up!
Some website require specific HTTP Header, in this case, User-agent. So you need to set this header in your request.
Change your request like this (17 - 20 line of your code)
# Make request object
request = urllib.request.Request(url, headers={"User-agent": "Python urllib test"})
# Open url using request object
response = urllib.request.urlopen(request, timeout=60)
# read response
data = response.read()
# decode your value
try:
value = data.decode('CP949')
except UnicodeDecodeError:
value = data.decode('UTF-8')
You can change "Python urllib test" to anything you want. Almost every servers use User-agent for statistical purposes.
Last, consider using appropritate whitespaces, blank lines, comments to make your code more readable. It will be good for you.
More reading:
HTTP/1.1: Header Field Definitions - to understand what is User-agent header.
21.6. urllib.request — Extensible library for opening URLs — Python 3.4.3 documentation - Always read documentation. Link to urllib.request.Request section.
Related
I am trying to connect to a page that takes in some values and return some data in JSON format in Python 3.4 using urllib. I want to save the values returned from the json into a csv file.
This is what I tried...
import json
import urllib.request
url = 'my_link/select?wt=json&indent=true&f=value'
response = urllib.request.Request(url)
response = urllib.request.urlopen(response)
data = response.read()
I am getting an error below:
urllib.error.HTTPError: HTTP Error 505: HTTP Version Not Supported
EDIT: Found a solution to my problem. I answered it below.
You have found a server that apparently doesn't want to talk HTTP/1.1. You could try lying to it by claiming you are using a HTTP/1.0 client instead, by patching the http.client.HTTPConnection class:
import http.client
http.client.HTTPConnection._http_vsn = 10
http.client.HTTPConnection._http_vsn_str = 'HTTP/1.0'
and re-trying your request.
I used FancyURLopener and it worked. Found this useful: docs.python.org: urllib.request
url_request = urllib.request.FancyURLopener({})
with url_request.open(url) as url_opener:
json_data = url_opener.read().decode('utf-8')
with open(file_output, 'w', encoding ='utf-8') as output:
output.write(json_data)
Hope this helps those having the same problems as mine.
I have a python program that takes pictures and I am wondering how I would write a program that sends those pictures to a particular URL.
If it matters, I am running this on a Raspberry Pi.
(Please excuse my simplicity, I am very new to all this)
Many folks turn to the requests library for this sort of thing.
For something lower level, you might use urllib2
I've been using the requests package as well. Here's an example POST from the requests documentation.
If you are feeling that you want to use CURL, try PyCurl.
Install it using:
sudo pip install pycurl
Here is an example of how to send data using it:
import pycurl
import json
import urllib
import cStringIO
url = 'your_url'
first_param = '12'
dArrayData = [{'data' : 'first'}, {'data':'second'}]
json_to_send = json.dumps(dArrayData, separators=(',',':'), sort_keys=False)
curlClient = pycurl.Curl()
curlClient.setopt(curlClient.USERAGENT, 'curl-user-agent')
# Sets the url of the service
curlClient.setopt(curlClient.URL, url)
# Sets the request to be of the type POST
curlClient.setopt(curlClient.POST, True)
# Sets the params of the post request
send_params = 'first_param=' + first_param + '&data=' + urllib.quote(json_to_send)
curlClient.setopt(curlClient.POSTFIELDS, send_params)
# Setting the buffer for the response to be written to
bufResponse = cStringIO.StringIO()
curlClient.setopt(curlClient.WRITEFUNCTION, bufResponse.write)
# Setting to fail on error
curlClient.setopt(curlClient.FAILONERROR, True)
# Sets the time out for the connections
curlClient.setopt(pycurl.CONNECTTIMEOUT, 25)
curlClient.setopt(pycurl.TIMEOUT, 25)
response = ''
try:
# Performs the operation
curlClient.perform()
except pycurl.error as err:
errno, errString = err
print '========'
print 'ERROR sending the data:'
print '========'
print 'CURL error code:', errno
print 'CURL error Message:', errString
else:
response = bufResponse.getvalue()
# Do what ever you want with the response.. Json it or what ever..
finally:
curlClient.close()
bufResponse.close()
The requests library is most supported and advanced way to do this.
i have a custom url of the form
http://somekey:somemorekey#host.com/getthisfile.json
i tried all the way but getting errors :
method 1 :
from httplib2 import Http
ipdb> from urllib import urlencode
h=Http()
ipdb> resp, content = h.request("3b8138fedf8:1d697a75c7e50#abc.myshopify.com/admin/shop.json")
error :
No help on =Http()
Got this method from here
method 2 :
import urllib
urllib.urlopen(url).read()
Error :
*** IOError: [Errno url error] unknown url type: '3b8108519e5378'
I guess something wrong with the encoding ..
i tried ...
ipdb> url.encode('idna')
*** UnicodeError: label empty or too long
Is there any way to make this Complex url get call easy .
You are using a PDB-based debugger instead of a interactive Python prompt. h is a command in PDB. Use ! to prevent PDB from trying to interpret the line as a command:
!h = Http()
urllib requires that you pass it a fully qualified URL; your URL is lacking a scheme:
urllib.urlopen('http://' + url).read()
Your URL does not appear to use any international characters in the domain name, so you do not need to use IDNA encoding.
You may want to look into the 3rd-party requests library; it makes interacting with HTTP servers that much easier and straightforward:
import requests
r = requests.get('http://abc.myshopify.com/admin/shop.json', auth=("3b8138fedf8", "1d697a75c7e50"))
data = r.json() # interpret the response as JSON data.
The current de facto HTTP library for Python is Requests.
import requests
response = requests.get(
"http://abc.myshopify.com/admin/shop.json",
auth=("3b8138fedf8", "1d697a75c7e50")
)
response.raise_for_status() # Raise an exception if HTTP error occurs
print response.content # Do something with the content.
I've been searching all around for a Python 3.x code sample to get HTTP Header information.
Something as simple as get_headers equivalent in PHP cannot be found in Python easily. Or maybe I am not sure how to best wrap my head around it.
In essence, I would like to code something where I can see whether a URL exists or not
something in the line of
h = get_headers(url)
if(h[0] == 200)
{
print("Bingo!")
}
So far, I tried
h = http.client.HTTPResponse('http://docs.python.org/')
But always got an error
To get an HTTP response code in python-3.x, use the urllib.request module:
>>> import urllib.request
>>> response = urllib.request.urlopen(url)
>>> response.getcode()
200
>>> if response.getcode() == 200:
... print('Bingo')
...
Bingo
The returned HTTPResponse Object will give you access to all of the headers, as well. For example:
>>> response.getheader('Server')
'Apache/2.2.16 (Debian)'
If the call to urllib.request.urlopen() fails, an HTTPError Exception is raised. You can handle this to get the response code:
import urllib.request
try:
response = urllib.request.urlopen(url)
if response.getcode() == 200:
print('Bingo')
else:
print('The response code was not 200, but: {}'.format(
response.get_code()))
except urllib.error.HTTPError as e:
print('''An error occurred: {}
The response code was {}'''.format(e, e.getcode()))
For Python 2.x
urllib, urllib2 or httplib can be used here. However note, urllib and urllib2 uses httplib. Therefore, depending on whether you plan to do this check a lot (1000s of times), it would be better to use httplib. Additional documentation and examples are here.
Example code:
import httplib
try:
h = httplib.HTTPConnection("www.google.com")
h.connect()
except Exception as ex:
print "Could not connect to page."
For Python 3.x
A similar story to urllib (or urllib2) and httplib from Python 2.x applies to the urllib2 and http.client libraries in Python 3.x. Again, http.client should be quicker. For more documentation and examples look here.
Example code:
import http.client
try:
conn = http.client.HTTPConnection("www.google.com")
conn.connect()
except Exception as ex:
print("Could not connect to page.")
and if you wanted to check the status codes you would need to replace
conn.connect()
with
conn.request("GET", "/index.html") # Could also use "HEAD" instead of "GET".
res = conn.getresponse()
if res.status == 200 or res.status == 302: # Specify codes here.
print("Page Found!")
Note, in both examples, if you would like to catch the specific exception relating to when the URL doesn't exist, rather than all of them, catch the socket.gaierror exception instead (see the socket documentation).
You can use requests module to check it:
import requests
url = "http://www.example.com/"
res = requests.get(url)
if res.status_code == 200:
print("bingo")
You can also check header contents before making downloading the whole content of the webpage by using header.
you can use the urllib2 library
import urllib2
if urllib2.urlopen(url).code == 200:
print "Bingo"
I'm trying to write a small program that will simply display the header information of a website. Here is the code:
import urllib2
url = 'http://some.ip.add.ress/'
request = urllib2.Request(url)
try:
html = urllib2.urlopen(request)
except urllib2.URLError, e:
print e.code
else:
print html.info()
If 'some.ip.add.ress' is google.com then the header information is returned without a problem. However if it's an ip address that requires basic authentication before access then it returns a 401. Is there a way to get header (or any other) information without authentication?
I've worked it out.
After try has failed due to unauthorized access the following modification will print the header information:
print e.info()
instead of:
print e.code()
Thanks for looking :)
If you want just the headers, instead of using urllib2, you should go lower level and use httplib
import httplib
conn = httplib.HTTPConnection(host)
conn.request("HEAD", path)
print conn.getresponse().getheaders()
If all you want are HTTP headers then you should make HEAD not GET request. You can see how to do this by reading Python - HEAD request with urllib2.