Using urllib.request to write an image - python

I am trying to use this code to download an image from the given URL
import urllib.request
resource = urllib.request.urlretrieve("http://farm2.static.flickr.com/1184/1013364004_bcf87ed140.jpg")
output = open("file01.jpg","wb")
output.write(resource)
output.close()
However, I get the following error:
TypeError Traceback (most recent call last)
<ipython-input-39-43fe4522fb3b> in <module>()
41 resource = urllib.request.urlretrieve("http://farm2.static.flickr.com/1184/1013364004_bcf87ed140.jpg")
42 output = open("file01.jpg","wb")
---> 43 output.write(resource)
44 output.close()
TypeError: a bytes-like object is required, not 'tuple'
I get that its the wrong data type for the .write() object but I don't know how to feed resource into output

Right, Using urllib.request.urlretrieve like this way:
import urllib.request
resource, headers = urllib.request.urlretrieve("http://farm2.static.flickr.com/1184/1013364004_bcf87ed140.jpg")
image_data = open(resource, "rb").read()
with open("file01.jpg", "wb") as f:
f.write(image_data)
PS: urllib.request.urlretrieve return a tuple, the first element is the location of temp file, you could try to get the bytes of temp file, and save it to a new file.
In Official document:
The following functions and classes are ported from the Python 2 module urllib (as opposed to urllib2). They might become deprecated at some point in the future.
So I would recommend you to use urllib.request.urlopen,try code below:
import urllib.request
resource = urllib.request.urlopen("http://farm2.static.flickr.com/1184/1013364004_bcf87ed140.jpg")
output = open("file01.jpg", "wb")
output.write(resource.read())
output.close()

Related

How to read .net file downloaded by urllib3?

I'm downloading the file airports.net from github with urllib3 and read it as a graph object with networkx.read_pajek as follows:
import urllib3
import networkx as nx
http = urllib3.PoolManager()
url = 'https://raw.githubusercontent.com/leanhdung1994/WebMining/main/airports.net'
f = http.request('GET', url)
G = nx.read_pajek(f.data(), encoding = 'UTF-8')
print(G)
Then there is an error
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-16-7728c1228755> in <module>
13 url = 'https://raw.githubusercontent.com/leanhdung1994/WebMining/main/airports.net'
14 f = http.request('GET', url)
---> 15 G = nx.read_pajek(f.data(), encoding = 'UTF-8')
16 print(G)
17
TypeError: 'bytes' object is not callable
Could you please elaborate on how to do so?
Update: If I change f.data() to f.data, then a new error appears
/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py:847: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-e96ad6eb1bfb> in <module>()
6 url = 'https://raw.githubusercontent.com/leanhdung1994/WebMining/main/airports.net'
7 f = http.request('GET', url)
----> 8 G = nx.read_pajek(f.data, encoding = 'UTF-8')
9 print(G)
<decorator-gen-781> in read_pajek(path, encoding)
4 frames
/usr/local/lib/python3.6/dist-packages/networkx/readwrite/pajek.py in <genexpr>(.0)
159 for format information.
160 """
--> 161 lines = (line.decode(encoding) for line in path)
162 return parse_pajek(lines)
163
AttributeError: 'int' object has no attribute 'decode'
As can be inferred from the error message and also read in the docs, HTTPResponse.data is a property of type bytes rather than a method. So you need f.data rather than f.data() in order to retrieve the value.
Update
Regarding the AttributeError: as can be verified in the network docs, function read_pajek expects its first argument to be a path to a file with the data, not the actual data. So you could dump the bytes to a file, then pass the path to that file as the argument. There are several options:
Just use a hardcoded filename. This is arguably the simplest and doesn't require additional imports.
import urllib3
import networkx as nx
FILE_NAME = "/tmp/test.net"
http = urllib3.PoolManager()
url = 'https://raw.githubusercontent.com/leanhdung1994/WebMining/main/airports.net'
f = http.request('GET', url)
with open(FILE_NAME, "w") as fh:
fh.write(f.data.decode())
G = nx.read_pajek(FILE_NAME, encoding='UTF-8')
print(f"G='{G}', G.size={G.size()}")
Use the tempfile standard library module to manage the file for you (i.e. give it a randomized name, then remove it after it is no longer used).
import tempfile
import urllib3
import networkx as nx
http = urllib3.PoolManager()
url = 'https://raw.githubusercontent.com/leanhdung1994/WebMining/main/airports.net'
f = http.request('GET', url)
with tempfile.NamedTemporaryFile() as fh:
fh.write(f.data)
G = nx.read_pajek(fh.name, encoding='UTF-8')
print(f"G='{G}', G.size={G.size()}")
Use io.BytesIO or io.StringIO ("in-memory file"). This creates an object which is stored in memory (RAM) but has an API like a regular file stored on the disk. Accessing things stored in RAM is much (much!) faster so this is useful for performance reasons. Of course, you can't always use it because you only have so much RAM, but in your particular case you already have the data in memory, so it would be a huge waste of time to have to dump it to disk, just to have networkx read it back to memory. Although in your particular case you probably won't notice the difference because you seem to only be downloading 1 not too large file once, but maybe it will come in handy in the future.
import io
import urllib3
import networkx as nx
http = urllib3.PoolManager()
url = 'https://raw.githubusercontent.com/leanhdung1994/WebMining/main/airports.net'
f = http.request('GET', url)
data = io.BytesIO(f.data)
G = nx.read_pajek(data, encoding = 'UTF-8')
print(f"G='{G}', G.size={G.size()}")

How can i write raw data?

I'm testing some things and i keep getting the error "write() argument must be str, not HTTPResponse" Here's my code:
import requests
image="http://www.casperdenhaan.nl/wp-content/uploads/2020/03/Logo.jpg"
savefile=open("image.png","w+")
savefile.write(requests.get(image).raw)
savefile.close()
I can get the raw data, but I can't write it to a new file. Is there a way i could get around this problem?
When you call .raw on the response object it returns an HTTPResponse object. You need to call .content to get a bytes object.
type(requests.get(image).raw)
urllib3.response.HTTPResponse
type(requests.get(image).content)
bytes
You need to open the file in write binary mode:
open("image.png","wb")
I suggest using a "with" block, this way you don't need to explicitly close the file. Here is a working version of the code:
import requests
url = "http://www.casperdenhaan.nl/wp-content/uploads/2020/03/Logo.jpg"
with open('image.png', 'wb') as f:
f.write(requests.get(url).content)
try this way of doing it
import requests
img_url = "http://www.casperdenhaan.nl/wp-content/uploads/2020/03/Logo.jpg"
img = requests.get(img_url)
with open('image.png', 'wb') as save_file:
save_file.write(img.raw)
This way you don't have to deal with closing the file. Also, the 'wb' opens the file in a writable binary mode.

Open an XML file through URL and save it

With Python 3, I want to read an XML web page and save it in my local drive.
Also, if the file already exist, it must overwrite it.
I tested some script like :
import urllib.request
xml = urllib.request.urlopen('URL')
data = xml.read()
file = open("file.xml","wb")
file.writelines(data)
file.close()
But I have an error :
TypeError: a bytes-like object is required, not 'int'
First suggestion: do what even the official urllib docs says and don't use urllib, use requests instead.
Your problem is that you use .writelines() and it expects a list of lines, not a bytes objects (for once in Python the error message is not very helpful). Use .write() instead
import requests
resp = requests.get('URL')
with open('file.xml', 'wb') as foutput:
foutput.write(resp.content)
I found a solution :
from urllib.request import urlopen
xml = open("import.xml", "r+")
xml.write(urlopen('URL').read().decode('utf-8'))
xml.close()
Thanks for your help.

Trying to write a cPickle object but get a 'write' attribute type error

When trying to apply some code I found on the internet in iPython, it's coming up with an error:
TypeError Traceback (most recent call last)
<ipython-input-4-36ec95de9a5d> in <module>()
13 all[i] = r.json()
14
---> 15 cPickle.dump(all, outfile)
TypeError: argument must have 'write' attribute
Here's what I have done in order:
outfile = "C:\John\Footy Bants\R COMPLAEX MATHS"
Then, I pasted in the following code:
import requests, cPickle, shutil, time
all = {}
errorout = open("errors.log", "w")
for i in range(600):
playerurl = "http://fantasy.premierleague.com/web/api/elements/%s/"
r = requests.get(playerurl % i)
# skip non-existent players
if r.status_code != 200: continue
all[i] = r.json()
cPickle.dump(all, outfile)
Here's the original article to give you an idea of what I'm trying to achieve:
http://billmill.org/fantasypl/
The second argument to cPickle.dump() must be a file object. You passed in a string containing a filename instead.
You need to use the open() function to open a file object for that filename, then pass the file object to cPickle:
with open(outfile, 'wb') as pickle_file:
cPickle.dump(all, pickle_file)
See the Reading and Writing Files section of the Python tutorial, including why using with when opening a file is a good idea (it'll be closed for you automatically).

Python 3.4 - reading data from a webpage

I'm currently trying to learn how to read from a webpage, and have tried the following:
>>>import urllib.request
>>>page = urllib.request.urlopen("http://docs.python-requests.org/en/latest/", data = None)
>>>contents = page.read()
>>>lines = contents.split('\n')
This gives the following error:
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
lines = contents.split('\n')
TypeError: Type str doesn't support the buffer API
Now I assumed that reading from a URL would be pretty similar from reading for a text file, and that the contents of contents would be of type str. Is this not that case?
When I try >>> contents I can see that the contents of contents is just the HTML document, so why doesn't `.split('\n') work? How can I make it work?
Please note that I'm splitting at the newline characters so I can print the webpage line by line.
Following the same train of thought, I then tried contents.readlines() which gave this error:
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
contents.readlines()
AttributeError: 'bytes' object has no attribute 'readlines'
Is the webpage stored in some object called 'bytes'?
Can someone explain to me what is happening here? And how to read the webpage properly?
You need to wrap it with an io.TextIOWrapper() object and encode your file (utf-8 is a universal you can change it to proper encoding too):
import urllib.request
import io
u = urllib.request.urlopen("http://docs.python-requests.org/en/latest/", data = None)
f = io.TextIOWrapper(u,encoding='utf-8')
text = f.read()
Decode the bytes object to produce a string:
lines = contents.decode(encoding="UTF-8").split("/n")
The return type of the read() method is of type bytes. You need to properly decode it to a string before you can use a string method like split. Assuming it is UTF-8 you can use:
s = contents.decode('utf-8')
lines = s.split('\n')
As a general solution you should check the character encoding the server provides in the response to your request and use that.

Categories