Python -- how to grab images off the internet - python

How can I grab a picture off of a known url and save it to my computer using Python (v2.6)? Thanks

You can use urllib.urlretrieve.
Copy a network object denoted by a URL to a local file, if necessary.
Example:
>>> import urllib
>>> urllib.urlretrieve('http://i.imgur.com/Ph4Xw.jpg', 'duck.jpg')
('duck.jpg', <httplib.HTTPMessage instance at 0x10118e830>)
# by now the file should be downloaded to 'duck.jpg'

You can use urllib.urlretrieve:
import urllib
urllib.urlretrieve('http://example.com/file.png', './file.png')
If you need more flexibility, use urllib2.

In the absence of any context, the following is a simple example of using standard library modules to make an non-authenticated HTTP GET request
import urllib2
response = urllib2.urlopen('http://lolcat.com/images/lolcats/1674.jpg')
with open('lolcat.jpg', 'wb') as outfile:
outfile.write(response.read())
EDIT: urlretrieve() is new to me. I guess then you could turn it into a command line one-liner... if you're bored.
$ python -c "import urllib; urllib.urlretrieve('http://lolcat.com/images/lolcats/1674.jpg', filename='/tmp/1674.jpg')"

batteries are included in urllib:
urllib.urlretrieve(yourUrl, fileName)

import urllib2
open("fish.jpg", "w").write(urllib2.urlopen("http://www.fiskeri.no/Fiskeslag/Fjesing.jpg").read())

Easy.
import urllib
urllib.urlretrieve("http://www.dokuwiki.org/_media/wiki:dokuwiki-128.png","dafile.png")

Related

Trouble finding library that supports http PUT in Python 2.7

I need to perform http PUT operations from python Which libraries have been proven to support this? More specifically I need to perform PUT on keypairs, not file upload.
I have been trying to work with the restful_lib.py, but I get invalid results from the API that I am testing. (I know the results are invalid because I can fire off the same request with curl from the command line and it works.)
After attending Pycon 2011 I came away with the impression that pycurl might be my solution, so I have been trying to implement that. I have two issues here. First, pycurl renames "PUT" as "UPLOAD" which seems to imply that it is focused on file uploads rather than key pairs. Second when I try to use it I never seem to get a return from the .perform() step.
Here is my current code:
import pycurl
import urllib
url='https://xxxxxx.com/xxx-rest'
UAM=pycurl.Curl()
def on_receive(data):
print data
arglist= [\
('username', 'testEmailAdd#test.com'),\
('email', 'testEmailAdd#test.com'),\
('username','testUserName'),\
('givenName','testFirstName'),\
('surname','testLastName')]
encodedarg=urllib.urlencode(arglist)
path2= url+"/user/"+"99b47002-56e5-4fe2-9802-9a760c9fb966"
UAM.setopt(pycurl.URL, path2)
UAM.setopt(pycurl.POSTFIELDS, encodedarg)
UAM.setopt(pycurl.SSL_VERIFYPEER, 0)
UAM.setopt(pycurl.UPLOAD, 1) #Set to "PUT"
UAM.setopt(pycurl.CONNECTTIMEOUT, 1)
UAM.setopt(pycurl.TIMEOUT, 2)
UAM.setopt(pycurl.WRITEFUNCTION, on_receive)
print "about to perform"
print UAM.perform()
httplib should manage.
http://docs.python.org/library/httplib.html
There's an example on this page http://effbot.org/librarybook/httplib.htm
urllib and urllib2 are also suggested.
Thank you all for your assistance. I think I have found an answer.
My code now looks like this:
import urllib
import httplib
import lxml
from lxml import etree
url='xxxx.com'
UAM=httplib.HTTPSConnection(url)
arglist= [\
('username', 'testEmailAdd#test.com'),\
('email', 'testEmailAdd#test.com'),\
('username','testUserName'),\
('givenName','testFirstName'),\
('surname','testLastName')\
]
encodedarg=urllib.urlencode(arglist)
uuid="99b47002-56e5-4fe2-9802-9a760c9fb966"
path= "/uam-rest/user/"+uuid
UAM.putrequest("PUT", path)
UAM.putheader('content-type','application/x-www-form-urlencoded')
UAM.putheader('accepts','application/com.internap.ca.uam.ama-v1+xml')
UAM.putheader("Content-Length", str(len(encodedarg)))
UAM.endheaders()
UAM.send(encodedarg)
response = UAM.getresponse()
html = etree.HTML(response.read())
result = etree.tostring(html, pretty_print=True, method="html")
print result
Updated: Now I am getting valid responses. This seems to be my solution. (The pretty print at the end isn't working, but I don't really care, that is just there while I am building the function.)

Python's `urlparse`: Adding GET keywords to a URL

I'm doing this:
urlparse.urljoin('http://example.com/mypage', '?name=joe')
And I get this:
'http://example.com/?name=joe'
While I want to get this:
'http://example.com/mypage?name=joe'
What am I doing wrong?
You could use urlparse.urlunparse :
import urlparse
parsed = list(urlparse.urlparse('http://example.com/mypage'))
parsed[4] = 'name=joe'
urlparse.urlunparse(parsed)
You're experiencing a known bug which affects Python 2.4-2.6.
If you can't change or patch your version of Python, #jd's solution will work around the issue.
However, if you need a more generic solution that works as a standard urljoin would, you can use a wrapper method which implements the workaround for that specific use case, and default to the standard urljoin() otherwise.
For example:
import urlparse
def myurljoin(base, url, allow_fragments=True):
if url[0] != "?":
return urlparse.urljoin(base, url, allow_fragments)
if not allow_fragments:
url = url.split("#", 1)[0]
parsed = list(urlparse.urlparse(base))
parsed[4] = url[1:] # assign params field
return urlparse.urlunparse(parsed)
I solved it by bundling Python 2.6's urlparse module with my project. I also had to bundle namedtuple which was defined in collections, since urlparse uses it.
Are you sure? On Python 2.7:
>>> import urlparse
>>> urlparse.urljoin('http://example.com/mypage', '?name=joe')
'http://example.com/mypage?name=joe'

Is it possible to use python suds to read a wsdl file from the file system?

From suds documentation, I can create a Client if I have a url for the WSDL.
from suds.client import Client
url = 'http://localhost:7080/webservices/WebServiceTestBean?wsdl'
client = Client(url)
I currently have the WSDL file on my file system. Is it possible to use suds to read the WSDL file from my file system instead of hosting it on a web server?
try to use url='file:///path/to/file'
Oneliner
# Python 3
import urllib, os
url = urllib.parse.urljoin('file:', urllib.request.pathname2url(os.path.abspath("service.xml")))
This is a more complete one liner that will:
let you specify just the local path,
get you the absolute path,
and then format it as a file-url.
Based upon:
the comments in the accepted answer and
this https://stackoverflow.com/a/14298190/622276
and thanks to user Sebastian the updated Python 3 implementation since we should avoid writing legacy python at this time.
Original for reference
# Python 2 (Legacy Python)
import urlparse, urllib, os
url = urlparse.urljoin('file:', urllib.pathname2url(os.path.abspath("service.xml")))
Using pathlib:
from pathlib import Path
url = Path('resources/your_definition.wsdl').absolute().as_uri()

Given a URL to a text file, what is the simplest way to read the contents of the text file?

In Python, when given the URL for a text file, what is the simplest way to access the contents off the text file and print the contents of the file out locally line-by-line without saving a local copy of the text file?
TargetURL=http://www.myhost.com/SomeFile.txt
#read the file
#print first line
#print second line
#etc
Edit 09/2016: In Python 3 and up use urllib.request instead of urllib2
Actually the simplest way is:
import urllib2 # the lib that handles the url stuff
data = urllib2.urlopen(target_url) # it's a file like object and works just like a file
for line in data: # files are iterable
print line
You don't even need "readlines", as Will suggested. You could even shorten it to: *
import urllib2
for line in urllib2.urlopen(target_url):
print line
But remember in Python, readability matters.
However, this is the simplest way but not the safe way because most of the time with network programming, you don't know if the amount of data to expect will be respected. So you'd generally better read a fixed and reasonable amount of data, something you know to be enough for the data you expect but will prevent your script from been flooded:
import urllib2
data = urllib2.urlopen("http://www.google.com").read(20000) # read only 20 000 chars
data = data.split("\n") # then split it into lines
for line in data:
print line
* Second example in Python 3:
import urllib.request # the lib that handles the url stuff
for line in urllib.request.urlopen(target_url):
print(line.decode('utf-8')) #utf-8 or iso8859-1 or whatever the page encoding scheme is
I'm a newbie to Python and the offhand comment about Python 3 in the accepted solution was confusing. For posterity, the code to do this in Python 3 is
import urllib.request
data = urllib.request.urlopen(target_url)
for line in data:
...
or alternatively
from urllib.request import urlopen
data = urlopen(target_url)
Note that just import urllib does not work.
The requests library has a simpler interface and works with both Python 2 and 3.
import requests
response = requests.get(target_url)
data = response.text
There's really no need to read line-by-line. You can get the whole thing like this:
import urllib
txt = urllib.urlopen(target_url).read()
import urllib2
for line in urllib2.urlopen("http://www.myhost.com/SomeFile.txt"):
print line
Another way in Python 3 is to use the urllib3 package.
import urllib3
http = urllib3.PoolManager()
response = http.request('GET', target_url)
data = response.data.decode('utf-8')
This can be a better option than urllib since urllib3 boasts having
Thread safety.
Connection pooling.
Client-side SSL/TLS verification.
File uploads with multipart encoding.
Helpers for retrying requests and dealing with HTTP redirects.
Support for gzip and deflate encoding.
Proxy support for HTTP and SOCKS.
100% test coverage.
import urllib2
f = urllib2.urlopen(target_url)
for l in f.readlines():
print l
For me, none of the above responses worked straight ahead. Instead, I had to do the following (Python 3):
from urllib.request import urlopen
data = urlopen("[your url goes here]").read().decode('utf-8')
# Do what you need to do with the data.
requests package works really well for simple ui
as #Andrew Mao suggested
import requests
response = requests.get('http://lib.stat.cmu.edu/datasets/boston')
data = response.text
for i, line in enumerate(data.split('\n')):
print(f'{i} {line}')
o/p:
0 The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
1 prices and the demand for clean air', J. Environ. Economics & Management,
2 vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, 'Regression diagnostics
3 ...', Wiley, 1980. N.B. Various transformations are used in the table on
4 pages 244-261 of the latter.
5
6 Variables in order:
Checkout kaggle notebook on how to extract dataset/dataframe from URL
I do think requests is the best option. Also note the possibility of setting encoding manually.
import requests
response = requests.get("http://www.gutenberg.org/files/10/10-0.txt")
# response.encoding = "utf-8"
hehe = response.text
Just updating here the solution suggested by #ken-kinder for Python 2 to work with Python 3:
import urllib
urllib.request.urlopen(target_url).read()
You can use this, as well for simple methodology:
import requests
url_res = requests.get(url= "http://www.myhost.com/SomeFile.txt")
with open(filename + ".txt", "wb") as file:
file.write(url_res.content)

Python error when using urllib.open

When I run this:
import urllib
feed = urllib.urlopen("http://www.yahoo.com")
print feed
I get this output in the interactive window (PythonWin):
<addinfourl at 48213968 whose fp = <socket._fileobject object at 0x02E14070>>
I'm expecting to get the source of the above URL. I know this has worked on other computers (like the ones at school) but this is on my laptop and I'm not sure what the problem is here. Also, I don't understand this error at all. What does it mean? Addinfourl? fp? Please help.
Try this:
print feed.read()
See Python docs here.
urllib.urlopen actually returns a file-like object so to retrieve the contents you will need to use:
import urllib
feed = urllib.urlopen("http://www.yahoo.com")
print feed.read()
In python 3.0:
import urllib
import urllib.request
fh = urllib.request.urlopen(url)
html = fh.read().decode("iso-8859-1")
fh.close()
print (html)

Categories