I am trying to download code from a Subversion repository without using the svn Subversion program. I am using Python's urllib2 module instead. Below is my script. The problem with this script is that it returns a web page of sorts and not the actual source-code. I can see the links to the source-code but not the actual source-code.
Does anyone have any suggestions on how I can download the actual source-code from Subversion with urllib2?
#! /usr/bin/env python
import urllib2
def sub():
theurl = 'https://Intranet-Server/svn/FancySoftware/trunk/'
username = 'username'
password = 'password'
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, theurl, username, password)
authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
print "OPENER :", opener
urllib2.install_opener(opener)
pagehandle = urllib2.urlopen(theurl)
print "PAGEHANDLE :", pagehandle
return pagehandle
if __name__ == "__main__":
ret = sub()
for line in ret:
print line
Related
I must run python to get some artifacts from repository in following syntax (invoked from batch with its variables) so this part to pass arguments is not changeable.
python get_artifacts.py %USERNAME%:%PASSWORD% http://url/artifactory/package.zip
My python script is the following:
import sys
import requests
from requests.auth import HTTPBasicAuth
def get_artifact(url, save_artifact_name, username, password, chunk_size=128):
try:
get_method = requests.get(url,
auth = HTTPBasicAuth(username, password), stream=True)
with open(save_artifact_name, 'wb') as artifact:
for chunk in get_method.iter_content(chunk_size=chunk_size):
artifact.write(chunk)
except requests.exceptions.RequestException as error:
sys.exit(str(error))
if __name__ == '__main__':
username_and_password = sys.argv[1].split(':')
username = username_and_password[0]
password = username_and_password[1]
url = sys.argv[2]
save_artifact_name = url.split("/")[-1]
print(f'Retrieving artifact {save_artifact_name}...')
get_artifact(url, save_artifact_name, username, password)
print("Finished successfully!")
Now I CAN see my package downloaded, but my zip package is invalid.
Of course with some other tool like curl.exe the same works.
So definitely I am missing something in python script but not able to determine what am I missing (download works but package is invalid).
Thanks a lot!
Here's an answer that is MUCH closer to the original, including the chunking that will work with minimal memory. It simply places the open() before the downloading code:
import sys
import requests
from requests.auth import HTTPBasicAuth
def get_artifact(url, save_artifact_name, username, password, chunk_size=128):
try:
with open(save_artifact_name, 'wb') as artifact:
get_method = requests.get(url,
auth = HTTPBasicAuth(username, password), stream=True)
for chunk in get_method.iter_content(chunk_size=chunk_size):
artifact.write(chunk)
except requests.exceptions.RequestException as error:
sys.exit(str(error))
if __name__ == '__main__':
username_and_password = sys.argv[1].split(':')
username = username_and_password[0]
password = username_and_password[1]
url = sys.argv[2]
save_artifact_name = url.split("/")[-1]
print(f'Retrieving artifact {save_artifact_name}...')
get_artifact(url, save_artifact_name, username, password)
print("Finished successfully!")
You're streaming the file a few bytes at a time and writing each chunk to the file but writing the file a-new each time so I suspect you're just seeing the last chunk in the file. Unless the file is hugely huge, you should be able to simply load the entire thing into memory then write it out. Here's the modified version:
import sys
import requests
from requests.auth import HTTPBasicAuth
def get_artifact(url, save_artifact_name, username, password):
try:
get_method = requests.get(url,
auth = HTTPBasicAuth(username, password))
with open(save_artifact_name, 'wb') as artifact:
artifact.write(get_method.content)
except requests.exceptions.RequestException as error:
sys.exit(str(error))
if __name__ == '__main__':
username_and_password = sys.argv[1].split(':')
username = username_and_password[0]
password = username_and_password[1]
url = sys.argv[2]
save_artifact_name = url.split("/")[-1]
print(f'Retrieving artifact {save_artifact_name}...')
get_artifact(url, save_artifact_name, username, password)
print("Finished successfully!")
That should fetch the entire file in one go and write it to your output. I've just tested this with a 5MB test file I found online and it downloaded just lovely.
The chunk size is no longer needed as you're not downloading in chunks. :)
I'm using urllib.request in python to try and download some build information from Teamcity. This request used to work without username and password, however a recent security change means I must use a username and password. So I have changed tried each of the two solutions below:
Attempt 1)
url = 'http://<domain>/httpAuth/app/rest/buildTypes/<buildlabel>/builds/running:false?count=1&start=0'
# create a password manager
password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm()
# Add the username and password.
top_level_url = "http://<domain>/httpAuth/app/rest/buildTypes/id:<buildlabel>/builds/running:false?count=1&start=0"
password_mgr.add_password(None, top_level_url, username, password)
handler = urllib.request.HTTPBasicAuthHandler(password_mgr)
# create "opener" (OpenerDirector instance)
opener = urllib.request.build_opener(handler)
# use the opener to fetch a URL
opener.open(url)
Attempt 2
url = 'http://<username>:<password>#<domain>/httpAuth/app/rest/buildTypes/id:buildlabel/builds/running:false?count=1&start=0'
rest_api = urllib.request.urlopen(url)
Both of these return "HTTP Error 401: Unauthorized". However if I was to print 'url' and copy this output into a browser the link works perfectly. But when used through python I get the above error.
I use something very similar in another Perl script and this works perfectly also.
* SOLVED BELOW *
Solved this using.
credentials(url, username, password)
rest_api = urllib2.urlopen(url)
latest_build_info = rest_api.read()
latest_build_info = latest_build_info.decode("UTF-8")
# Then parse this xml for the information I want.
def credentials(self, url, username, password):
p = urllib2.HTTPPasswordMgrWithDefaultRealm()
p.add_password(None, url, username, password)
handler = urllib2.HTTPBasicAuthHandler(p)
opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)
As a side note, I then want to download a file..
credentials(url, username, password)
urllib2.urlretrieve(url, downloaded_file)
Where Url is:
http://<teamcityServer>/repository/download/<build Label>/<BuildID>:id/Filename.zip
The server I'm trying to logon and download a file from is using Basic Auth as I can confirm from Chrome Dev Tools and some tests. So I write code like below, bad example of OOP perhaps, but should make sense.
class Utils(object):
def __init__(self, username, password):
self.username = username
self.password = password
self.top_level_url = 'http://test.com/'
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, self.top_level_url, self.username, self.password)
basic_auth_handler = urllib2.HTTPBasicAuthHandler(password_mgr)
opener = urllib2.build_opener(basi_auth_handler)
urllib2.install_opener(opener)
def download(self, filename):
url = self.top_level_url + filename
req = urllib2.Request(url)
try:
response = urllib2.urlopen(req)
return response
except urllib2.HTTPError as e:
print e.headers
raise
Strange things happen, when I initialize a Utils object and download the file repeatedly:
u = Utils('username', 'password')
index = 0
while 1:
resp = u.download('file.txt')
index += 1
time.sleep(1)
The scripts works for the first 5 times of download, but at the 6th time, it would raise HTTPError 401. But if I change the code, add the post header to include 'Authorization: Basic ***' instead of using HTTPBasicAuthHandler, it works every time... So is this something wrong with my code or the server part setup?
I am fairly new to web programing but for the sake of it, I am trying to login to google account not using standard code but as a python application, but it is impossible to do so
has anyone tried to this before? can anyone help?
I made a python class that handle google login and the is able to get any google service page that requires the user to be logged in:
class SessionGoogle:
def __init__(self, url_login, url_auth, login, pwd):
self.ses = requests.session()
login_html = self.ses.get(url_login)
soup_login = BeautifulSoup(login_html.content).find('form').find_all('input')
my_dict = {}
for u in soup_login:
if u.has_attr('value'):
my_dict[u['name']] = u['value']
# override the inputs without login and pwd:
my_dict['Email'] = login
my_dict['Passwd'] = pwd
self.ses.post(url_auth, data=my_dict)
def get(self, URL):
return self.ses.get(URL).text
The idea is to go to the login page GALX hidden input value and send it back to google + login and password. It requires modules requests and beautifulSoup
Example of use:
url_login = "https://accounts.google.com/ServiceLogin"
url_auth = "https://accounts.google.com/ServiceLoginAuth"
session = SessionGoogle(url_login, url_auth, "myGoogleLogin", "myPassword")
print session.get("http://plus.google.com")
Hope this helps
Although probably not exactly what you were looking for here I found some code from a similar post that did run from me.
import urllib2
def get_unread_msgs(user, passwd):
auth_handler = urllib2.HTTPBasicAuthHandler()
auth_handler.add_password(
realm='New mail feed',
uri='https://mail.google.com',
user='%s#gmail.com' % user,
passwd=passwd
)
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
feed = urllib2.urlopen('https://mail.google.com/mail/feed/atom')
return feed.read()
print get_unread_msgs("put-username-here","put-password-here")
reference:
How to auto log into gmail atom feed with Python?
2020 update for python 3:
import urllib.request
def unread_messages(user, passwd):
auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password(
realm='New mail feed',
uri='https://mail.google.com',
user='%s#gmail.com' % user,
passwd=passwd
)
opener = urllib.request.build_opener(auth_handler)
urllib.request.install_opener(opener)
feed = urllib.request.urlopen('https://mail.google.com/mail/feed/atom')
return feed.read()
print(unread_messages('username', 'password'))
You can use urllib, urllib2 and cookielib libraries of python to login.
import urllib, urllib2, cookielib
def test_login():
username = '' # Gmail Address
password = '' # Gmail Password
cookie_jar = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie_jar))
login_dict = urllib.urlencode({'username' : username, 'password' :password})
opener.open('https://accounts.google.com/ServiceLogin', login_dict)
response = opener.open('https://plus.google.com/explore')
print response.read()
if __name__ == '__main__':
test_login()
What's the best way to specify a proxy with username and password for an http connection in python?
This works for me:
import urllib2
proxy = urllib2.ProxyHandler({'http': 'http://
username:password#proxyurl:proxyport'})
auth = urllib2.HTTPBasicAuthHandler()
opener = urllib2.build_opener(proxy, auth, urllib2.HTTPHandler)
urllib2.install_opener(opener)
conn = urllib2.urlopen('http://python.org')
return_str = conn.read()
Use this:
import requests
proxies = {"http":"http://username:password#proxy_ip:proxy_port"}
r = requests.get("http://www.example.com/", proxies=proxies)
print(r.content)
I think it's much simpler than using urllib. I don't understand why people love using urllib so much.
Setting an environment var named http_proxy like this: http://username:password#proxy_url:port
The best way of going through a proxy that requires authentication is using urllib2 to build a custom url opener, then using that to make all the requests you want to go through the proxy. Note in particular, you probably don't want to embed the proxy password in the url or the python source code (unless it's just a quick hack).
import urllib2
def get_proxy_opener(proxyurl, proxyuser, proxypass, proxyscheme="http"):
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, proxyurl, proxyuser, proxypass)
proxy_handler = urllib2.ProxyHandler({proxyscheme: proxyurl})
proxy_auth_handler = urllib2.ProxyBasicAuthHandler(password_mgr)
return urllib2.build_opener(proxy_handler, proxy_auth_handler)
if __name__ == "__main__":
import sys
if len(sys.argv) > 4:
url_opener = get_proxy_opener(*sys.argv[1:4])
for url in sys.argv[4:]:
print url_opener.open(url).headers
else:
print "Usage:", sys.argv[0], "proxy user pass fetchurls..."
In a more complex program, you can seperate these components out as appropriate (for instance, only using one password manager for the lifetime of the application). The python documentation has more examples on how to do complex things with urllib2 that you might also find useful.
Or if you want to install it, so that it is always used with urllib2.urlopen (so you don't need to keep a reference to the opener around):
import urllib2
url = 'www.proxyurl.com'
username = 'user'
password = 'pass'
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
# None, with the "WithDefaultRealm" password manager means
# that the user/pass will be used for any realm (where
# there isn't a more specific match).
password_mgr.add_password(None, url, username, password)
auth_handler = urllib2.HTTPBasicAuthHandler(password_mgr)
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
print urllib2.urlopen("http://www.example.com/folder/page.html").read()
Here is the method use urllib
import urllib.request
# set up authentication info
authinfo = urllib.request.HTTPBasicAuthHandler()
proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})
# build a new opener that adds authentication and caching FTP handlers
opener = urllib.request.build_opener(proxy_support, authinfo,
urllib.request.CacheFTPHandler)
# install it
urllib.request.install_opener(opener)
f = urllib.request.urlopen('http://www.python.org/')
"""