How to create a class to download a file in Python - python

I am trying to write a class in Python to open a specific URL given and return the data of that URL...
class Openurl:
def download(self, url):
req = urllib2.Request( url )
content = urllib2.urlopen( req )
data = content.read()
content.close()
return data
url = 'www.somesite.com'
dl = openurl()
data = dl.download(url)
Could someone correct my approach? I know one might ask why not just directly open it, but I want to show a message while it is being downloaded. The class will only have one instance.

You have a few problems.
One that I'm sure is not in your original code is the failure to import urllib2.
The second problem is that dl = openurl() should be dl = Openurl(). This is because Python is case sensitive.
The third problem is that your URL needs http:// before it. This gets rid of an unknown url type error. After that, you should be good to go!

It should be dl = Openurl(), python is case sensitive

Related

Transform a multiple line url request into a function in Python

I try to download a serie of text files from different websites. I am using urllib.request with Python. I want to expend the list of URL without making the code long.
The working sequence is
import urllib.request
url01 = 'https://web.site.com/this.txt'
url02 = 'https://web.site.com/kind.txt'
url03 = 'https://web.site.com/of.txt'
url04 = 'https://web.site.com/link.txt'
[...]
urllib.request.urlretrieve(url01, "Liste n°01.txt")
urllib.request.urlretrieve(url02, "Liste n°02.txt")
urllib.request.urlretrieve(url03, "Liste n°03.txt")
[...]
The number of file to download is increasing and I want to keep the second part of the code short.
I tried
i = 0
while i<51
i = i +1
urllib.request.urlretrieve( i , "Liste n°0+"i"+.txt")
It doesn't work and I am thinking that a while loop can be use for string but not for request.
So I was thinking of making it a function.
def newfunction(i)
return urllib.request.urlretrieve(url"i", "Liste n°0"+1+".txt")
But it seem that I am missing a big chunk of it.
This request is working but it seem I cannot transform it for long list or URL.
As a general suggestion, I'd recommend the requests module for Python, rather than urllib.
Based on that, some naive code for a possible function:
import requests
def get_file(site, filename):
target = site + "/" + filename
try:
r = requests.get(target, allow_redirects=True)
open(filename, 'wb').write(r.content)
return r.status_code
except requests.exceptions.RequestException as e:
print("File not downloaded, error: {}".format(e))
You can then call the function, passing in parameters of site and file name:
get_file('https://web.site.com', 'this.txt')
The function will raise an exception, but not stop execution, if it cannot download a file. You could expand exception handling to deal with files not being writable, but this should be a start.
It seems as if you're not casting the variable i to an integer before your concatenating it to the url string. That may be the reason why you're code isn't working. The while-loop/for-loop approach shouldn't effect whether or not the requests get sent out. I recommend using the requests module for making requests as well. Mike's post covers what the function should kind of look like. I also recommend creating a sessions object if you're going to be making a whole lot of requests in a piece of code. The sessions object will keep the underlying TCP connection open while you make your requests, which should reduce latency, CPU usage, and network congestion (https://en.wikipedia.org/wiki/HTTP_persistent_connection#Advantages). The code would look something like this:
import requests
with requests.Session() as s:
for i in range(10):
s.get(str(i)+'.com') # make request
# write to file here
To cast to a string you would want something like this:
i = 0
while i<51
i = i +1
urllib.request.urlretrieve( i , "Liste n°0" + str(i) + ".txt")

dropbox.exceptions.ApiError: ... CreateFileRequestError('validation_error', None)

I want to get and save the URL of an image to a variable via dropbox API in python. I'm following this guide but I get the error shown in the title.
I searched for the function dbx.file_requests_create and probably doing something wrong with title or destination. Should the title be some existing source? Because I just set it by myself.
import dropbox
dbx = dropbox.Dropbox('Y2_M...aVP')
req = dbx.file_requests_create(title='Images', destination='/C:/Users/Dropbox/Apps/myProject/image.jpg')
print req.url
print req.id
EDIT: I found this link FileRequestError. It says:
There was an error validating the request. For example, the title was invalid, or there were disallowed characters in the destination path.
EDIT-2 [SOLVED]: Thanks to Aran-Fey and Greg for their comments, I solved the problem by replacing req = dbx.file_requests_create(title='Images', destination='/C:/Users/Dropbox/Apps/myProject/image.jpg') with
req = dbx.sharing_create_shared_link_with_settings('/image.jpg', settings=None)
Also, for people who have problem with getting the image when sharing it, just change the last character of the link from 0 to 1 as mentioned on this and this.
You can add this line to the end newURL = req.url[:-1] + "1" to solve the matter.

Weird json value urllib python

I'm trying to manipulate a dynamic JSON from this site:
http://esaj.tjsc.jus.br/cposgtj/imagemCaptcha.do
It has 3 elements, imagem, a base64, labelValorCaptcha, just a message, and uuidCaptcha, a value to pass by parameter to play a sound in this link bellow:
http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha=sajcaptcha_e7b072e1fce5493cbdc46c9e4738ab8a
When I enter in the first site through a browser and put in the second link the uuidCaptha after the equal ("..uuidCaptcha="), the sound plays normally. I wrote a simple code to catch this elements.
import urllib, json
url = "http://esaj.tjsc.jus.br/cposgtj/imagemCaptcha.do"
response = urllib.urlopen(url)
data = json.loads(response.read())
urlSound = "http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha="
print urlSound + data['uuidCaptcha']
But I dont know what's happening, the caught value of the uuidCaptcha doesn't work. Open a error web page.
Someone knows?
Thanks!
It works for me.
$ cat a.py
#!/usr/bin/env python
# encoding: utf-8
import urllib, json
url = "http://esaj.tjsc.jus.br/cposgtj/imagemCaptcha.do"
response = urllib.urlopen(url)
data = json.loads(response.read())
urlSound = "http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha="
print urlSound + data['uuidCaptcha']
$ python a.py
http://esaj.tjsc.jus.br/cposgtj/somCaptcha.do?timestamp=1455996420264&uuidCaptcha=sajcaptcha_efc8d4bc3bdb428eab8370c4e04ab42c
As I said #Charlie Harding, the best way is download the page and get the JSON values, because this JSON is dynamic and need an opened web link to exist.
More info here.

Check web content changed (OSM-tile, python)

In my Python code I'm downloading tiles form Openstreetmap (OSM). For Performance and traffic reason, they are stored in an temp storage. However before reusing this data, I want to check if this data is still up-to-date.
This is how simple download is done:
import urllib2
# Normal import without Version control:
url = r"http://a.tile.openstreetmap.org/1/1/1.png"
imgstr = urllib2.urlopen(url).read()
I'm searching something likle this (pseudo code)
imgstr = ... # Value from database
local_version = ... # Value from database
online_version = getolineversionnumber(url)
if not(online_version==local_verion):
imgstr = urllib2.urlopen(url).read()
version = online_version
is there such a function like getolineversionnumber?
**Question answered by hint of scai. No more answers required. **
It is good practice, to post answers to own questions for other readers. Here is what I have learned.
The property which i was searching is called etag (https://en.wikipedia.org/wiki/HTTP_ETag).
and accessed like:
import urllib2
url = r"http://a.tile.openstreetmap.org/1/1/1.png"
request = urllib2.Request(url)
opener = urllib2.build_opener()
firstdatastream = opener.open(request)
online_version=firstdatastream.headers.dict['etag']

python, urllib2 weird error?

friends.
I'm trying to rewrite one of my little tools. basically, it gets an input from user, and if that input doesn't contain the "base url" a function will construct that input into a valid url for other part of the program to work on.
if I were wrote it so the program only accepts valid url as input, it will work; however if I pass a string and construct it, urllib2.urlopen() will fail, and I have no idea why, as the value return is exactly the same str value...
import urllib2
import re
class XunLeiKuaiChuan:
kuaichuanBaseAddress = 'http://kuaichuan.xunlei.com/d/'
regexQuery = 'file_name=\"(.*?)\"\sfile_url=\"(.*?)\sfile_size=\"(.*?)\"'
agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2)'
def buildLink(self, aLink):
if aLink == '':
return
if 'xunlei.com' not in aLink:
aLink = self.kuaichuanBaseAddress + aLink
return aLink
def decodeLink(self, url):
url = self.buildLink(url) #it will return correct url with the value provided.
print 'in decodeLink ' + url
urlReq = urllib2.Request(url)
urlReq.add_header('User-agent', self.agent)
pageContent = urllib2.urlopen(urlReq).read()
realLinks = re.findall(self.regexQuery, pageContent)
return realLinks
test = XunLeiKuaiChuan()
link='y7L1AwKuOwDeCClS528'
link2 = 'http://kuai.xunlei.com/d/y7L1AwKuOwDeCClS528'
s = test.decodeLink(link2)
print s
when I call it with link2 it will function as expected. and will fail when use 'link' someone tell me what I miss here? my "old version" work with only accept full url, but this unknown behavior is killing me here......Thank you.
btw if with full url it returns an empty list, just open the url and enter the catcha on the page. they do it to prevent some kind of 'attacks'....
never mind I got hostname in the code wrong.

Categories