Constructing a URI in Python? - python

I'm trying to convert a Java program to Python, and one thing that I am currently stuck on is working with URI's in Python. I found urllib.response in Python, but I'm struggling to figure out how to utilize it.
What I'm trying to do with this URI is obtain user info (particularly username and password), the host and path. In Java, there are associated methods (getUserInfo(), getHost(), and getPath()) for this, but I'm having trouble finding equivalents for this in Python, even after looking up the urllib.response Python documentation.
The equivalent code in Java is:
URI dbUri = new URI(env);
username = dbUri.getUserInfo().split(":")[0];
password = dbUri.getUserInfo().split(":")[1];
dbUrl = "jdbc:postgresql://" + dbUri.getHost() + dbUri.getPath();
and what would be the appropriate methods that could be used to convert this to Python?

Seems like you'd want to use something like urllib.parse.urlparse.
https://docs.python.org/3/library/urllib.parse.html#urllib.parse.urlparse
from urllib.parse import urlparse
db_url = urlparse(raw_url_string)
username = db_url.username
password = db_url.password
host = db_url.net_loc
path = db_url.path
...
You might need to adjust this a bit. There is a subtle difference between urlparse and urlsplit regarding parameters. Then you can use one of the urlunparse or unsplit.

Code
from urllib import parse
from urllib.parse import urlsplit
url = 'http://localhost:5432/postgres?user=postgres&password=somePassword'
split_url = urlsplit(url)
hostname = split_url.netloc
path = split_url.path
params = dict(parse.parse_qsl(split_url.query))
username = params['user']
password = params['password']
db_url = "jdbc:postgresql://" + hostname + path
print(db_uri)
Output
jdbc:postgresql://localhost:5432/postgres

Related

How to convert requests.RequestsCookieJar to string

I have a requests.cookies.RequestCookieJar object which contains multiple cookies from different domain/path. How can I extract a cookies string for a particular domain/path following the rules mentioned in here?
For example
>>> r = requests.get("https://stackoverflow.com")
>>> print(r.cookies)
<RequestsCookieJar[<Cookie prov=4df137f9-848e-01c3-f01b-35ec61022540 for .stackoverflow.com/>]>
# the function I expect
>>> getCookies(r.cookies, "stackoverflow.com")
"prov=4df137f9-848e-01c3-f01b-35ec61022540"
>>> getCookies(r.cookies, "meta.stackoverflow.com")
"prov=4df137f9-848e-01c3-f01b-35ec61022540"
# meta.stackoverflow.com is also satisfied as it is subdomain of .stackoverflow.com
>>> getCookies(r.cookies, "google.com")
""
# r.cookies does not contains any cookie for google.com, so it return empty string
I think you need to work with a Python dictionary of the cookies. (See my comment above.)
def getCookies(cookie_jar, domain):
cookie_dict = cookie_jar.get_dict(domain=domain)
found = ['%s=%s' % (name, value) for (name, value) in cookie_dict.items()]
return ';'.join(found)
Your example:
>>> r = requests.get("https://stackoverflow.com")
>>> getCookies(r.cookies, ".stackoverflow.com")
"prov=4df137f9-848e-01c3-f01b-35ec61022540"
NEW ANSWER
Ok, so I still don't get exactly what it is you are trying to achieve.
If you want to extract the originating url from a requests.RequestCookieJar object (so that you could then check if there is a match with a given subdomain) that is (as far as I know) impossible.
However, you could off course do something like:
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import requests
import re
class getCookies():
def __init__(self, url):
self.cookiejar = requests.get(url).cookies
self.url = url
def check_domain(self, domain):
try:
base_domain = re.compile("(?<=\.).+\..+$").search(domain).group()
except AttributeError:
base_domain = domain
if base_domain in self.url:
print("\"prov=" + str(dict(self.cookiejar)["prov"]) + "\"")
else:
print("No cookies for " + domain + " in this jar!")
Then if you do:
new_instance = getCookies("https://stackoverflow.com")
You could then do:
new_instance.check_domain("meta.stackoverflow.com")
Which would give the output:
"prov=5d4fda78-d042-2ee9-9a85-f507df184094"
While:
new_instance.check_domain("google.com")
Would output:
"No cookies for google.com in this jar!"
Then, if you (if needed) fine-tune the regex & create a list of urls, you could first loop through the list to create many instances and save them in eg a list or dict. In a second loop you could check another list of urls to see if their cookies might be present in any of the instances.
OLD ANSWER
The docs you link to explain:
items()
Dict-like items() that returns a list of name-value
tuples from the jar. Allows client-code to call
dict(RequestsCookieJar) and get a vanilla python dict of key value
pairs.
I think what you are looking for is:
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import requests
def getCookies(url):
r = requests.get(url)
print("\"prov=" + str(dict(r.cookies)["prov"]) + "\"")
Now I can run it like this:
>>> getCookies("https://stackoverflow.com")
"prov=f7712c78-b489-ee5f-5e8f-93c85ca06475"
actually , when I just have the problem as you are. but when I access the Class Define
class RequestsCookieJar(cookielib.CookieJar, MutableMapping):
I found a func called def get_dict(self, domain=None, path=None):
you can simply write code like this
raw = "rawCookide"
print(len(cookie))
mycookie = SimpleCookie()
mycookie.load(raw)
UCookie={}
for key, morsel in mycookie.items():
UCookie[key] = morsel.value
The following code is not promised to be "forward compatible" because I am accessing attributes of classes that were intentionally hidden (kind of) by their authors; however, if you must get into the attributes of a cookie, take a look here:
import http.cookies
import requests
import json
import sys
import os
aresponse = requests.get('https://www.att.com')
requestscookiejar = aresponse.cookies
for cdomain,cooks in requestscookiejar._cookies.items():
for cpath, cookgrp in cooks.items():
for cname,cattribs in cookgrp.items():
print(cattribs.version)
print(cattribs.name)
print(cattribs.value)
print(cattribs.port)
print(cattribs.port_specified)
print(cattribs.domain)
print(cattribs.domain_specified)
print(cattribs.domain_initial_dot)
print(cattribs.path)
print(cattribs.path_specified)
print(cattribs.secure)
print(cattribs.expires)
print(cattribs.discard)
print(cattribs.comment)
print(cattribs.comment_url)
print(cattribs.rfc2109)
print(cattribs._rest)
When a person needs to access the simple attributes of cookies is it likely less complicated to go after the following way. This avoids the use of RequestsCookieJar. Here we construct a single SimpleCookie instance by reading from the headers attribute of a response object instead of the cookies attribute. The name SimpleCookie would seem to imply a single cookie but that isn't what a simple cookie is. Try it out:
import http.cookies
import requests
import json
import sys
import os
def parse_cookies(http_response):
cookie_grp = http.cookies.SimpleCookie()
for h,v in http_response.headers.items():
if 'set-cookie' in h.lower():
for cook in v.split(','):
cookie_grp.load(cook)
return cookie_grp
aresponse = requests.get('https://www.att.com')
cookies = parse_cookies(aresponse)
print(str(cookies))
You can get list of domains in ResponseCookieJar and then dump the cookies for each domain with the following code:
import requests
response = requests.get("https://stackoverflow.com")
cjar = response.cookies
for domain in cjar.list_domains():
print(f'Cookies for {domain}: {cjar.get_dict(domain=domain)}')
Outputs:
Cookies for domain .stackoverflow.com: {'prov': 'efe8c1b7-ddbd-4ad5-9060-89ea6c29479e'}
In this example, only one domain is listed. It would have multiple lines in output if there were cookies for multiple domains in the Jar.
For many usecases, the cookie jar can be serialized by simply ignoring domains by calling:
dCookies = cjar.get_dict()
We can easily extract cookies string for a particular domain/path using functions already available in requests lib.
import requests
from requests.models import Request
from requests.cookies import get_cookie_header
session = requests.session()
r1 = session.get("https://www.google.com")
r2 = session.get("https://stackoverflow.com")
cookie_header1 = get_cookie_header(session.cookies, Request(method="GET", url="https://www.google.com"))
# '1P_JAR=2022-02-19-18; NID=511=Hz9Mlgl7DtS4uhTqjGOEolNwzciYlUtspJYxQ0GWOfEm9u9x-_nJ1jpawixONmVuyua59DFBvpQZkPzNAeZdnJjwiB2ky4AEFYVV'
cookie_header2 = get_cookie_header(session.cookies, Request(method="GET", url="https://stackoverflow.com"))
# 'prov=883c41a4-603b-898c-1d14-26e30e3c8774'
Request is used to prepare a :class:PreparedRequest <PreparedRequest>, which is sent to the server.
What you need is get_dict() method
a_session = requests.Session()
a_session.get('https://google.com/')
session_cookies = a_session.cookies
cookies_dictionary = session_cookies.get_dict()
# Now just print it or convert to json
as_string = json.dumps(cookies_dictionary)
print(cookies_dictionary)

Django : extract a path from a full URL

In a Django 1.8 simple tag, I need to resolve the path to the HTTP_REFERER found in the context. I have a piece of code that works, but I would like to know if a more elegant solution could be implemented using Django tools.
Here is my code :
from django.core.urlresolvers import resolve, Resolver404
# [...]
#register.simple_tag(takes_context=True)
def simple_tag_example(context):
# The referer is a full path: http://host:port/path/to/referer/
# We only want the path: /path/to/referer/
referer = context.request.META.get('HTTP_REFERER')
if referer is None:
return ''
# Build the string http://host:port/
prefix = '%s://%s' % (context.request.scheme, context.request.get_host())
path = referer.replace(prefix, '')
resolvermatch = resolve(path)
# Do something very interesting with this resolvermatch...
So I manually construct the string 'http://sub.domain.tld:port', then I remove it from the full path to HTTP_REFERER found in context.request.META. It works but it seems a bit overwhelming for me.
I tried to build a HttpRequest from referer without success. Is there a class or type that I can use to easily extract the path from an URL?
You can use urlparse module to extract the path:
try:
from urllib.parse import urlparse # Python 3
except ImportError:
from urlparse import urlparse # Python 2
parsed = urlparse('http://stackoverflow.com/questions/32809595')
print(parsed.path)
Output:
'/questions/32809595'

Regarding memory consumption in python with urllib2 module

PS: I have a similar question with Requests HTTP library here.
I am using python v2.7 on windows 7 OS. I am using urllib2 module. I have two code snippets. One file is named as myServer.py The server class has 2 methods named as getName(self,code) and getValue(self).
The other script named as testServer.py simply calls the methods from the server class to retrieve the values and prints them. The server class basically retrieves the values from a Server in my local network. So, unfortunately I can't provide you the access for testing the code.
Problem: When I execute my testServer.py file, I observed in the task manager that the memory consumption keeps increasing. Why is it increasing and how to avoid it? If I comment out the following line
print serverObj.getName(1234)
in testServer.py then there is no increase in memory consumption.
I am sure that the problem is with the getName(self,code) of the server class. But unfortunately, I couldn't figure out what the problem is.
Code: Please find the code snippets below:
#This is the myServer.py file
import urllib2
import json
import random
class server():
def __init__(self):
url1 = 'https://10.0.0.1/'
username = 'user'
password = 'passw0rd'
passwrdmgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
passwrdmgr.add_password(None, url1, username, password)
authhandler = urllib2.HTTPBasicAuthHandler(passwrdmgr)
opener = urllib2.build_opener(authhandler)
urllib2.install_opener(opener)
def getName(self, code):
code = str(code)
url = 'https://10.0.0.1/' + code
response = urllib2.urlopen(url)
data = response.read()
name = str(data).strip()
return name
def getValue(self):
value = random.randrange(0,11)
return value
The following is the testServer.py snippet
from myServer import server
import time
serverObj = server()
while True:
time.sleep(1)
print serverObj.getName(1234)
print serverObj.getValue()
Thank you for your time!
This is question is quite similar to my other question. So I think the answer is also quite similar. The answer can be found here https://stackoverflow.com/a/23172330/2382792

Python requests fails to get webpages

I am using Python3 and the package requests to fetch HTML data.
I have tried running the line
r = requests.get('https://github.com/timeline.json')
, which is the example on their tutorial, to no avail. However, when I run
request = requests.get('http://www.math.ksu.edu/events/grad_conf_2013/')
it works fine. I am getting errors such as
AttributeError: 'MockRequest' object has no attribute 'unverifiable'
Error in sys.excepthook:
I am thinking the errors have something to do with the type of webpage I am attempting to get, since the html page that is working is just basic html that I wrote.
I am very new to requests and Python in general. I am also new to stackoverflow.
As a little example, here is a little tool which I developed in order to fetch data from a website, in this case IP and show it:
# Import the requests module
# TODO: Make sure to install it first
import requests
# Get the raw information from the website
r = requests.get('http://whatismyipaddress.com')
raw_page_source_list = r.text
text = ''
# Join the whole list into a single string in order
# to simplify things
text = text.join(raw_page_source_list)
# Get the exact starting position of the IP address string
ip_text_pos = text.find('IP Information') + 62
# Now extract the IP address and store it
ip_address = text[ip_text_pos : ip_text_pos + 12]
# print 'Your IP address is: %s' % ip_address
# or, for Python 3 ... #
# print('Your IP address is: %s' % ip_address)

How do you IXR_Base64 in python?

What I'm trying to do is upload a picture to wordpress using wp.uploadFile xmlrpc method.
To do this, in PHP there is an example here: https://stackoverflow.com/a/8910496/1212382
I'm trying to do the same thing in python but I don't know how.
Anyone any ideas?
ok, the answer lies in the xmlrpclib class.
To send base64 bits to wordpress from python you need to use the xmlrpclib class like so:
base64bits = xmlrpclib.Binary(file_content)
then you just add the base64bits variable to the 'bits' parameter in your wp.uploadFile xmlrpc request.
to be a little more exact, here's the complete code in python of how this should be done:
import xmlrpclib
import urllib2
from datetime import date
import time
def get_url_content(url):
try:
content = urllib2.urlopen(url)
return content.read()
except:
print 'error! NOOOOOO!!!'
file_url = 'http://the path to your picture'
extension = file_url.split(".")
leng = extension.__len__()
extension = extension[leng-1]
if (extension=='jpg'):
xfileType = 'image/jpeg'
elif(extension=='png'):
xfileType='image/png'
elif(extension=='bmp'):
xfileType = 'image/bmp'
file = get_url_content(file_url)
file = xmlrpclib.Binary(file)
server = xmlrpclib.Server('http://website.com/xmlrpc.php')
filename = str(date.today())+str(time.strftime('%H:%M:%S'))
mediarray = {'name':filename+'.'+extension,
'type':xfileType,
'bits':file,
'overwrite':'false'}
xarr = ['1', 'USERHERE', 'PASSWORDHERE', mediarray]
result = server.wp.uploadFile(xarr)
print result

Categories