friends.
I'm trying to rewrite one of my little tools. basically, it gets an input from user, and if that input doesn't contain the "base url" a function will construct that input into a valid url for other part of the program to work on.
if I were wrote it so the program only accepts valid url as input, it will work; however if I pass a string and construct it, urllib2.urlopen() will fail, and I have no idea why, as the value return is exactly the same str value...
import urllib2
import re
class XunLeiKuaiChuan:
kuaichuanBaseAddress = 'http://kuaichuan.xunlei.com/d/'
regexQuery = 'file_name=\"(.*?)\"\sfile_url=\"(.*?)\sfile_size=\"(.*?)\"'
agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2)'
def buildLink(self, aLink):
if aLink == '':
return
if 'xunlei.com' not in aLink:
aLink = self.kuaichuanBaseAddress + aLink
return aLink
def decodeLink(self, url):
url = self.buildLink(url) #it will return correct url with the value provided.
print 'in decodeLink ' + url
urlReq = urllib2.Request(url)
urlReq.add_header('User-agent', self.agent)
pageContent = urllib2.urlopen(urlReq).read()
realLinks = re.findall(self.regexQuery, pageContent)
return realLinks
test = XunLeiKuaiChuan()
link='y7L1AwKuOwDeCClS528'
link2 = 'http://kuai.xunlei.com/d/y7L1AwKuOwDeCClS528'
s = test.decodeLink(link2)
print s
when I call it with link2 it will function as expected. and will fail when use 'link' someone tell me what I miss here? my "old version" work with only accept full url, but this unknown behavior is killing me here......Thank you.
btw if with full url it returns an empty list, just open the url and enter the catcha on the page. they do it to prevent some kind of 'attacks'....
never mind I got hostname in the code wrong.
Related
I am new to bottle. I have searched for the answer but could not get any.
Is there anyway to load a page without redirect?
The code is like this:
from bottle import get, run, template
#get('/list/item')
def listItems():
r = requests.get('my/url/which/works/')
return r.content
if __name__ == '__main__':
run(host='localhost', port=8080 )
and the webpage is empty. I have also tried return r.text, return template(r.content), return str(r.content), return str(r.content.decode('utf-8')), and
nf = urllib.urlopen('my/url/whick/works')
return nf.read()
none of them returns the page I want.
However, if I write this return r.content[0], it works. the page will show the first letter, which is '<'. But if I write return r.content[0:100], it returns a empty page again.
If I run the requests in command line, this is what it returns:
>>> import requests
>>> r = requests.get('my/url/which/works/')
>>>
>>> r.content
'<?xml version="1.0" encoding="utf-8" ?> \n<body copyright="...>\n</body>\n'
Is that possible that anyone can help about this? Thank you very much.
Your question doesn't specify what you expect to see when your code works as expected, but this may help you:
from bottle import get, run, template, response
#get('/list/item')
def listItems():
r = requests.get('my/url/which/works/')
response.content_type = 'text/plain' # added this
return r.content
...
The only change is that your webserver will now be setting the content type of the response to text/plain, which should make your browser render the page.
Longer term, if (as I'm inferring from your question) you intend to return XML responses, then I suggest either installing a browser plugin (here's an example), or, better yet, use curl or wget to see the precise response. This will help you debug your server's responses.
Let's say I'm passing a variable into a function, and I want to ensure it's properly formatted for my end use, with consideration for several potential unwanted formats.
Example; I want to store only lowercase representations of url addresses, without http:// or https://.
def standardize(url):
# Lowercase
temp_url = url
url = temp_url.lower()
# Remove 'http://'
if 'http://' in url:
temp_url = url
url = temp_url.replace('http://', '')
if 'https://' in url:
temp_url = url
url = temp_url.replace('https://', '')
return url
I'm only just encroaching on the title of Novice, and was wondering if there is more pythonic approach to achieving this type of process?
End goal being the trasformation of a url as such https://myurl.com/RANDoM --> myurl.com/random
The application of url string formating isn't of any particular importance.
A simple re.sub will do the trick:
import re
def standardize(url):
return re.sub("^https?://",'',url.lower())
# with 'https'
print(standardize('https://myurl.com/RANDoM')) # prints 'myurl.com/random'
# with 'http'
print(standardize('http://myurl.com/RANDoM')) # prints 'myurl.com/random'
# both works
def standardize(url):
return url.lower().replace("https://","").replace("http://","")
That's as simple as I can make it, but, the chaining is a little ugly.
If you want to import regex, could also do something like this:
import re
def standardize(url):
return re.sub("^https?://", "", url.lower())
I tried to get url as an input for my python program not working
import request
get_url=raw_input(" ")
Page = get.request('get_url')
But
Page = get.request('www.armagsolutions.com')
is working.
Can someone help me with a example to get url as input?
In this line
Page = get.request('get_url')
you pass as arg to function str 'get_url' instead of variable get_url. Use instead
Page = get.request(get_url)
I am trying to write a class in Python to open a specific URL given and return the data of that URL...
class Openurl:
def download(self, url):
req = urllib2.Request( url )
content = urllib2.urlopen( req )
data = content.read()
content.close()
return data
url = 'www.somesite.com'
dl = openurl()
data = dl.download(url)
Could someone correct my approach? I know one might ask why not just directly open it, but I want to show a message while it is being downloaded. The class will only have one instance.
You have a few problems.
One that I'm sure is not in your original code is the failure to import urllib2.
The second problem is that dl = openurl() should be dl = Openurl(). This is because Python is case sensitive.
The third problem is that your URL needs http:// before it. This gets rid of an unknown url type error. After that, you should be good to go!
It should be dl = Openurl(), python is case sensitive
import urllib, urllib2, json
def make_request(method, base, path, params):
if method == 'GET':
return json.loads(urllib2.urlopen(base+path+"?"+urllib.urlencode(params)).read())
elif method == 'POST':
return json.loads(urllib2.urlopen(base+path, urllib.urlencode(params)).read())
api_key = "5f1d5cb35cac44d3b"
print make_request("GET", "https://indit.ca/api/", "v1/version", {"api_key": api_key})
This set of code returns should return back the version and status like {status: 'ok', version: '1.1.0'}
What code do I need to add to print that response ?
It's hard to tell what the problem is without a complete, otherwise-working example (I can't even resolve host indit.ca), but I can explain how you can debug this yourself. Break it down step by step:
import urllib, urllib2, json
def make_request(method, base, path, params):
if method == 'GET':
url = base+path+"?"+urllib.urlencode(params)
print 'url={}'.format(url)
req = urllib2.urlopen(url)
print 'req={}'.format(req)
body = req.read()
print 'body={}'.format(body)
obj = json.loads(body)
print 'obj={}'.format(obj)
return obj
elif method == 'POST':
# You could do the same here, but your test only uses "GET"
return json.loads(urllib2.urlopen(base+path, urllib.urlencode(params)).read())
api_key = "5f1d5cb35cac44d3b"
print make_request("GET", "https://indit.ca/api/", "v1/version", {"api_key": api_key})
Now you can see where it goes wrong. Is it generating the right URL? (What happens if you paste that URL into a browser address bar, or a wget or curl command line?) Does urlopen return the kind of object you expected? Does the body look right? And so on.
Ideally, this will solve the problem for you. If not, at least you'll have a much more specific question to ask, and are much more likely to get a useful answer.