python add http:// to url open sys args - python

Very new to python, I'm trying to take in a command line argument which is a website and set it to a variable. I use the line below to do this.
sitefile = ur.urlopen(sys.argv[1])
The problem is it only works when formatted perfectly in the command line as 'python filename.py http://mywebsite.com/contacts.html'. I want to be able to drop the http://. I've tried the following 2 ways:
sitefile = ur.urlopen(sys.argv[1].startswith('http://'))
gets error message: AttributeError: 'bool' object has no attribute 'timeout'
sitefile = 'http://' + (ur.urlopen(sys.argv[1]))
gets error message: ValueError: unknown url type: 'mywebsite.com/contacts.html'. It appears to just ignore the first half of this concat.

What's the ur? give the complete code.
sys.argv[1].startswith('http://') return a bool object, drop the http:// should use sys.argv[1].replace('http://', '').
I think the code should like the following lines:
#!/usr/bin/env python
# encoding: utf-8
import sys
import urllib2
if len(sys.argv) != 2:
print('usage: {} <url>'.format(sys.argv[0]))
sys.exit()
url = sys.argv[1]
req = urllib2.urlopen(url)
print(req.read())

Related

Use rtkit to get content from tickets in Request Tracker

i'm trying to get some contents from tickets with REST api in Ubuntu 16.04 and i'm having truble getting that content using the next code :
from rtkit.resource import RTResource
from rtkit.authenticators import QueryStringAuthenticator
from rtkit.errors import RTResourceError
from rtkit import set_logging
import logging
import re
set_logging('debug')
logger = logging.getLogger('rtkit')
resource = RTResource('http://ubuntu/rt/REST/1.0/', 'root', '**passwd**', QueryStringAuthenticator)
try:
response = resource.get(path='ticket/2')
myTicket = response.as_object() ## Returns an RtObj instance
except RTResourceError as e:
logger.error(e.response.status_int)
logger.error(e.response.status)
logger.error(e.response.parsed)
And the terminal is giving this error:
File "LoginQuery.py", line 85, in <module>
myTicket = response.as_object() ## Returns an RtObj instance
AttributeError: 'RTResponse' object has no attribute 'as_object'
Did someone had this problem too?? and know how to solve it??
Help :)
according to the package documentation it seems like the proper way to read the response is to use response.parsed:
try:
response = resource.get(path='ticket/1')
for r in response.parsed:
for t in r:
logger.info(t)
except RTResourceError as e:
logger.error(e.response.status_int)
logger.error(e.response.status)
logger.error(e.response.parsed)
Yes, but i was trying to get the information from the contents separately... and some hours latter i cam with this :
try:
response = resource.get(path='ticket/2')
Ticket = response.parsed
Criation = Ticket[0][12][1]
This allow me to get de date when it was created

Passing a list as a url value to urlopen

Motivation
Motivated by this problem - the OP was using urlopen() and accidentally passed a sys.argv list instead of a string as a url. This error message was thrown:
AttributeError: 'list' object has no attribute 'timeout'
Because of the way urlopen was written, the error message itself and the traceback is not very informative and may be difficult to understand especially for a Python newcomer:
Traceback (most recent call last):
File "test.py", line 15, in <module>
get_category_links(sys.argv)
File "test.py", line 10, in get_category_links
response = urlopen(url)
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 420, in open
req.timeout = timeout
AttributeError: 'list' object has no attribute 'timeout'
Problem
Here is the shortened code I'm working with:
try:
from urllib.request import urlopen
except ImportError:
from urllib2 import urlopen
import sys
def get_category_links(url):
response = urlopen(url)
# do smth with response
print(response)
get_category_links(sys.argv)
I'm trying to think whether this kind of an error can be caught statically with either smart IDEs like PyCharm, static code analysis tools like flake8 or pylint, or with language features like type annotations.
But, I'm failing to detect the problem:
it is probably too specific for flake8 and pylint to catch - they don't warn about the problem
PyCharm does not warn about sys.argv being passed into urlopen, even though, if you "jump to source" of sys.argv it is defined as:
argv = [] # real value of type <class 'list'> skipped
if I annotate the function parameter as a string and pass sys.argv, no warnings as well:
def get_category_links(url: str) -> None:
response = urlopen(url)
# do smth with response
get_category_links(sys.argv)
Question
Is it possible to catch this problem statically (without actually executing the code)?
Instead of keeping it editor specific, you can use mypy to analyze your code. This way it will run on all dev environments instead of just for those who use PyCharm.
from urllib.request import urlopen
import sys
def get_category_links(url: str) -> None:
response = urlopen(url)
# do smth with response
get_category_links(sys.argv)
response = urlopen(sys.argv)
The issues pointed out by mypy for the above code:
error: Argument 1 to "get_category_links" has incompatible type List[str]; expected "str"
error: Argument 1 to "urlopen" has incompatible type List[str]; expected "Union[str, Request]"
Mypy here can guess the type of sys.argv because of its definition in its stub file. Right now some standard library modules are still missing from typeshed though, so you will have to either contribute them or ignore the errors related till they get added :-).
When to run mypy?
To catch such errors you can run mypy on the files with annotations with your tests in your CI tool. Running it on all files in project may take some time, for a small project it is your choice.
Add a pre-commit hook that runs mypy on staged files and points out issues right away(could be a little annoying to the dev if it takes a while).
Firstly, you need to check whether the url type is string or not and if string then check for ValueError exception(Valid url)
import sys
from urllib2 import urlopen
def get_category_links(url):
if type(url) != type(""): #Check if url is string or not
print "Please give string url"
return
try:
response = urlopen(url)
# do smth with response
print(response)
except ValueError: #If url is string but invalid
print "Bad URL"
get_category_links(sys.argv)

Python “TypeError: 'builtin_function_or_method' object is not subscriptable"

It looks like sth error in it, but i failed to find it.
from urllib.request import Request, urlopen
from urllib.error import URLError,HTTPError
from bs4 import BeautifulSoup
import re
print('https://v.qq.com/x/page/h03425k44l2.html\\\\n\\\\https://v.qq.com/x/cover/dn7fdvf2q62wfka/m0345brcwdk.html\\\\n\\\\http://v.qq.com/cover/2/2iqrhqekbtgwp1s.html?vid=c01350046ds')
web = input('请输入网址:')
if re.search(r'vid=',web) :
patten =re.compile(r'vid=(.*)')
vid=patten.findall(web)
vid=vid[0]
else:
newurl = (web.split("/")[-1])
vid =newurl.replace('.html', ' ')
#从视频页面找出vid
getinfo='http://vv.video.qq.com/getinfo?vids{vid}&otype=xlm&defaultfmt=fhd'.format(vid=vid.strip())
def getpage(url):
req = Request(url)
user_agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit'
req.add_header('User-Agent', user_agent)
try:
response = urlopen(url)
except HTTPError as e:
print('The server couldn\\\'t fulfill the request.')
print('Error code:', e.code)
except URLError as e:
print('We failed to reach a server.')
print('Reason:', e.reason)
html = response.read().decode('utf-8')
return(html)
#打开网页的函数
a = getpage(getinfo)
soup = BeautifulSoup(a, "html.parser")
for e1 in soup.find_all('url'):
ippattent = re.compile(r"((?:(2[0-4]\\\\d)|(25[0-5])|([01]\\\\d\\\\d?))\\\\.){3}(?:(2[0-4]\\\\d)|(255[0-5])|([01]?\\\\d\\\\d?))")
if re.search(ippattent,e1.get_text()):
ip=(e1.get_text())
for e2 in soup.find_all('id'):
idpattent = re.compile(r"\\\\d{5}")
if re.search(idpattent,e2.get_text()):
id=(e2.get_text())
filename=vid.strip()+'.p'+id[2:]+'.1.mp4'
#找到ID和拼接FILENAME
getkey='http://vv.video.qq.com/getkey?format={id}&otype=xml&vt=150&vid{vid}&ran=0%2E9477521511726081\\\\&charge=0&filename={filename}&platform=11'.format(id=id,vid=vid.strip(),filename=filename)
#利用getinfo中的信息拼接getkey网址
b = getpage(getkey)
key=(re.findall(r'<key>(.*)<\\\\/key>',b))
videourl=ip+filename+'?'+'vkey='+key[0]
print('视频播放地址 '+videourl)
#完成了
I run it and get this:
Traceback (most recent call last):
File "C:\Users\DYZ_TOGA\Desktop\qq.py", line 46, in <module>
filename=vid.strip()+'.p'+id[2:]+'.1.mp4'
TypeError: 'builtin_function_or_method' object is not subscriptable
What should I do? I don't know how to change my code to correct it.
The root of your problem is here:
if re.search(idpattent,e2.get_text()):
id=(e2.get_text())
If this is false, you never set id. And that means id is the built-in function of that name, which gets the unique ID of any object. Since it's a function, not the string you expect, you can't do this:
id[2:]
Hence the error you are getting.
My suggestions are:
Use a different variable name; you would have get an error about it not being defined in this case, which would have made solving the problem easier
When you don't find the ID, don't continue the script; it won't work anyway. If you expected to find it, and are not sure why that's not happening, that's a different question you should ask separately.
id is a builtin function in python. it seems you are using the same to store variable. It is bad habit to use keyword as variable name. Use some different name instead.
if re.search(idpattent,e2.get_text()):
id=(e2.get_text())
filename=vid.strip()+'.p'+id[2:]+'.1.mp4'
If the above "if" is not true, id will not be set to string value.
By default id is a function is python . So you cannot do id[2:]
because python expects id().

write into a text file issue

I have this simple python code:
import facebook # pip install facebook-sdk
import json
import codecs
ACCESS_TOKEN = ''
g = facebook.GraphAPI(ACCESS_TOKEN)
for i in range (0,2):
f = open("samples"+str(i)+".txt", 'w')
my_likes = [ like['id']
for like in g.request('search', { 'q' : '&','type' : 'page', 'limit' : 5000 ,'offset' :i , 'locale' : 'ar_AR' })['data'] ]
f.write( my_likes )
f.close()
what I want to do is to store data exists on my_likes list into a text file. but a write() takes "no keyword arguments" error message appear to me. what am I doing wrong here?
edit: if I remove indent=1 an error message appears: "expected a character buffer object"
what I am doing wrong here?
The error message tells you what you are doing wrong; you are passing a keyword argument (indent=1) to write(), which shouldn't have any. As per the documentation, write only has one argument, the string to write. If you want to indent it in the file, add a tab ('\t') to the start of each line in the string.

How do I search for text in a page using regular expressions in Python?

I'm trying to create a simple module for phenny, a simple IRC bot framework in Python. The module is supposed to go to http://www.isup.me/websitetheuserrequested to check is a website was up or down. I assumed I could use regex for the module seeing as other built-in modules use it too, so I tried creating this simple script although I don't think I did it right.
import re, urllib
import web
isupuri = 'http://www.isup.me/%s'
check = re.compile(r'(?ims)<span class="body">.*?</span>')
def isup(phenny, input):
global isupuri
global cleanup
bytes = web.get(isupuri)
quote = check.findall(bytes)
result = re.sub(r'<[^>]*?>', '', str(quote[0]))
phenny.say(result)
isup.commands = ['isup']
isup.priority = 'low'
isup.example = '.isup google.com'
It imports the required web packages (I think), and defines the string and the text to look for within the page. I really don't know what I did in those four lines, I kinda just ripped the code off another phenny module.
Here is an example of a quotes module that grabs a random quote from some webpage, I kinda tried to use that as a base: http://pastebin.com/vs5ypHZy
Does anyone know what I am doing wrong? If something needs clarified I can tell you, I don't think I explained this enough.
Here is the error I get:
Traceback (most recent call last):
File "C:\phenny\bot.py", line 189, in call
try: func(phenny, input)
File "C:\phenny\modules\isup.py", line 18, in isup
result = re.sub(r'<[^>]*?>', '', str(quote[0]))
IndexError: list index out of range
try this (from http://docs.python.org/release/2.6.7/library/httplib.html#examples):
import httplib
conn = httplib.HTTPConnection("www.python.org")
conn.request("HEAD","/index.html")
res = conn.getresponse()
if res.status >= 200 and res.status < 300:
print "up"
else:
print "down"
You will also need to add code to follow redirects before checking the response status.
edit
Alternative that does not need to handle redirects but uses exceptions for logic:
import urllib2
request = urllib2.Request('http://google.com')
request.get_method = lambda : 'HEAD'
try:
response = urllib2.urlopen(request)
print "up"
print response.code
except urllib2.URLError, e:
# failure
print "down"
print e
You should do your own tests and choose the best one.
The error means your regexp wasn't found anywhere on the page (the list quote has no element 0).

Categories