Here's what I'm doing:
from urllib import quote_plus
email = quote_plus('name+other#gmail.com')
but
print email
> name+other%40gmail.com
so if I put it in a URL the + becomes a space when I pull it out. But I want it to stay a plus. What am I doing wrong?
Try to define safe parameter explicitly, like in the code below:
from urllib import quote_plus
email = quote_plus('name+other#gmail.com', safe='/')
print email
If that works the reason I can assume is that the source code of the urllib.quote_plus has been modified somehow...
Related
I have a requests.cookies.RequestCookieJar object which contains multiple cookies from different domain/path. How can I extract a cookies string for a particular domain/path following the rules mentioned in here?
For example
>>> r = requests.get("https://stackoverflow.com")
>>> print(r.cookies)
<RequestsCookieJar[<Cookie prov=4df137f9-848e-01c3-f01b-35ec61022540 for .stackoverflow.com/>]>
# the function I expect
>>> getCookies(r.cookies, "stackoverflow.com")
"prov=4df137f9-848e-01c3-f01b-35ec61022540"
>>> getCookies(r.cookies, "meta.stackoverflow.com")
"prov=4df137f9-848e-01c3-f01b-35ec61022540"
# meta.stackoverflow.com is also satisfied as it is subdomain of .stackoverflow.com
>>> getCookies(r.cookies, "google.com")
""
# r.cookies does not contains any cookie for google.com, so it return empty string
I think you need to work with a Python dictionary of the cookies. (See my comment above.)
def getCookies(cookie_jar, domain):
cookie_dict = cookie_jar.get_dict(domain=domain)
found = ['%s=%s' % (name, value) for (name, value) in cookie_dict.items()]
return ';'.join(found)
Your example:
>>> r = requests.get("https://stackoverflow.com")
>>> getCookies(r.cookies, ".stackoverflow.com")
"prov=4df137f9-848e-01c3-f01b-35ec61022540"
NEW ANSWER
Ok, so I still don't get exactly what it is you are trying to achieve.
If you want to extract the originating url from a requests.RequestCookieJar object (so that you could then check if there is a match with a given subdomain) that is (as far as I know) impossible.
However, you could off course do something like:
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import requests
import re
class getCookies():
def __init__(self, url):
self.cookiejar = requests.get(url).cookies
self.url = url
def check_domain(self, domain):
try:
base_domain = re.compile("(?<=\.).+\..+$").search(domain).group()
except AttributeError:
base_domain = domain
if base_domain in self.url:
print("\"prov=" + str(dict(self.cookiejar)["prov"]) + "\"")
else:
print("No cookies for " + domain + " in this jar!")
Then if you do:
new_instance = getCookies("https://stackoverflow.com")
You could then do:
new_instance.check_domain("meta.stackoverflow.com")
Which would give the output:
"prov=5d4fda78-d042-2ee9-9a85-f507df184094"
While:
new_instance.check_domain("google.com")
Would output:
"No cookies for google.com in this jar!"
Then, if you (if needed) fine-tune the regex & create a list of urls, you could first loop through the list to create many instances and save them in eg a list or dict. In a second loop you could check another list of urls to see if their cookies might be present in any of the instances.
OLD ANSWER
The docs you link to explain:
items()
Dict-like items() that returns a list of name-value
tuples from the jar. Allows client-code to call
dict(RequestsCookieJar) and get a vanilla python dict of key value
pairs.
I think what you are looking for is:
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
import requests
def getCookies(url):
r = requests.get(url)
print("\"prov=" + str(dict(r.cookies)["prov"]) + "\"")
Now I can run it like this:
>>> getCookies("https://stackoverflow.com")
"prov=f7712c78-b489-ee5f-5e8f-93c85ca06475"
actually , when I just have the problem as you are. but when I access the Class Define
class RequestsCookieJar(cookielib.CookieJar, MutableMapping):
I found a func called def get_dict(self, domain=None, path=None):
you can simply write code like this
raw = "rawCookide"
print(len(cookie))
mycookie = SimpleCookie()
mycookie.load(raw)
UCookie={}
for key, morsel in mycookie.items():
UCookie[key] = morsel.value
The following code is not promised to be "forward compatible" because I am accessing attributes of classes that were intentionally hidden (kind of) by their authors; however, if you must get into the attributes of a cookie, take a look here:
import http.cookies
import requests
import json
import sys
import os
aresponse = requests.get('https://www.att.com')
requestscookiejar = aresponse.cookies
for cdomain,cooks in requestscookiejar._cookies.items():
for cpath, cookgrp in cooks.items():
for cname,cattribs in cookgrp.items():
print(cattribs.version)
print(cattribs.name)
print(cattribs.value)
print(cattribs.port)
print(cattribs.port_specified)
print(cattribs.domain)
print(cattribs.domain_specified)
print(cattribs.domain_initial_dot)
print(cattribs.path)
print(cattribs.path_specified)
print(cattribs.secure)
print(cattribs.expires)
print(cattribs.discard)
print(cattribs.comment)
print(cattribs.comment_url)
print(cattribs.rfc2109)
print(cattribs._rest)
When a person needs to access the simple attributes of cookies is it likely less complicated to go after the following way. This avoids the use of RequestsCookieJar. Here we construct a single SimpleCookie instance by reading from the headers attribute of a response object instead of the cookies attribute. The name SimpleCookie would seem to imply a single cookie but that isn't what a simple cookie is. Try it out:
import http.cookies
import requests
import json
import sys
import os
def parse_cookies(http_response):
cookie_grp = http.cookies.SimpleCookie()
for h,v in http_response.headers.items():
if 'set-cookie' in h.lower():
for cook in v.split(','):
cookie_grp.load(cook)
return cookie_grp
aresponse = requests.get('https://www.att.com')
cookies = parse_cookies(aresponse)
print(str(cookies))
You can get list of domains in ResponseCookieJar and then dump the cookies for each domain with the following code:
import requests
response = requests.get("https://stackoverflow.com")
cjar = response.cookies
for domain in cjar.list_domains():
print(f'Cookies for {domain}: {cjar.get_dict(domain=domain)}')
Outputs:
Cookies for domain .stackoverflow.com: {'prov': 'efe8c1b7-ddbd-4ad5-9060-89ea6c29479e'}
In this example, only one domain is listed. It would have multiple lines in output if there were cookies for multiple domains in the Jar.
For many usecases, the cookie jar can be serialized by simply ignoring domains by calling:
dCookies = cjar.get_dict()
We can easily extract cookies string for a particular domain/path using functions already available in requests lib.
import requests
from requests.models import Request
from requests.cookies import get_cookie_header
session = requests.session()
r1 = session.get("https://www.google.com")
r2 = session.get("https://stackoverflow.com")
cookie_header1 = get_cookie_header(session.cookies, Request(method="GET", url="https://www.google.com"))
# '1P_JAR=2022-02-19-18; NID=511=Hz9Mlgl7DtS4uhTqjGOEolNwzciYlUtspJYxQ0GWOfEm9u9x-_nJ1jpawixONmVuyua59DFBvpQZkPzNAeZdnJjwiB2ky4AEFYVV'
cookie_header2 = get_cookie_header(session.cookies, Request(method="GET", url="https://stackoverflow.com"))
# 'prov=883c41a4-603b-898c-1d14-26e30e3c8774'
Request is used to prepare a :class:PreparedRequest <PreparedRequest>, which is sent to the server.
What you need is get_dict() method
a_session = requests.Session()
a_session.get('https://google.com/')
session_cookies = a_session.cookies
cookies_dictionary = session_cookies.get_dict()
# Now just print it or convert to json
as_string = json.dumps(cookies_dictionary)
print(cookies_dictionary)
While trying to access a file whose name contain utf-8 chars from browser I get the error
The requested URL /images/0/04/×¤×ª×¨×•× ×•×ª_תרגילי×_על_משטחי×_דיפ'_2014.pdf was not found on this server.
Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.`
In order to access the files I wrote the following python script:
# encoding: utf8
__author__ = 'Danis'
__date__ = '20/10/14'
import urllib
curr_link = u'http://math-wiki.com/images/0/04/2014_\'דיפ_משטחים_על_פתרונות.nn uft8pdf'
urllib.urlretrieve(link, 'home/danisf/targil4.pdf')
but when I run the code I get the error URLError:<curr_link appears here> contains non-ASCII characters
How can I fix the code to get him work? (by the way I don't have access to the server or to the webmaster) maybe the browser failed not because the bad encoding of the name for the file?
You cannot just pass Unicode URLs into urllib functions; URLs must be valid bytestrings instead. You'll need to encode to UTF-8, then url quote the path of your URL:
import urllib
import urlparse
curr_link = u'http://math-wiki.com/images/0/04/2014_\'דיפ_משטחים_על_פתרונות.nn uft8pdf'
parsed_link = urlparse.urlsplit(curr_link.encode('utf8'))
parsed_link = parsed_link._replace(path=urllib.quote(parsed_link.path))
encoded_link = parsed_link.geturl()
urllib.urlretrieve(encoded_link, 'home/danisf/targil4.pdf')
The specific URL you provided in your question produces a 404 error however.
Demo:
>>> import urllib
>>> import urlparse
>>> curr_link = u'http://math-wiki.com/images/0/04/2014_\'דיפ_משטחים_על_פתרונות.nn uft8pdf'
>>> parsed_link = urlparse.urlsplit(curr_link.encode('utf8'))
>>> parsed_link = parsed_link._replace(path=urllib.quote(parsed_link.path))
>>> print parsed_link.geturl()
http://math-wiki.com/images/0/04/2014_%27%D7%93%D7%99%D7%A4_%D7%9E%D7%A9%D7%98%D7%97%D7%99%D7%9D_%D7%A2%D7%9C_%D7%A4%D7%AA%D7%A8%D7%95%D7%A0%D7%95%D7%AA.nn%20uft8pdf
Your browser usually decodes UTF-8 bytes encoded like this, to present a readable URL, but when sending the URL to the server to retrieve, it is encoded in the exact same manner.
I need to perform http PUT operations from python Which libraries have been proven to support this? More specifically I need to perform PUT on keypairs, not file upload.
I have been trying to work with the restful_lib.py, but I get invalid results from the API that I am testing. (I know the results are invalid because I can fire off the same request with curl from the command line and it works.)
After attending Pycon 2011 I came away with the impression that pycurl might be my solution, so I have been trying to implement that. I have two issues here. First, pycurl renames "PUT" as "UPLOAD" which seems to imply that it is focused on file uploads rather than key pairs. Second when I try to use it I never seem to get a return from the .perform() step.
Here is my current code:
import pycurl
import urllib
url='https://xxxxxx.com/xxx-rest'
UAM=pycurl.Curl()
def on_receive(data):
print data
arglist= [\
('username', 'testEmailAdd#test.com'),\
('email', 'testEmailAdd#test.com'),\
('username','testUserName'),\
('givenName','testFirstName'),\
('surname','testLastName')]
encodedarg=urllib.urlencode(arglist)
path2= url+"/user/"+"99b47002-56e5-4fe2-9802-9a760c9fb966"
UAM.setopt(pycurl.URL, path2)
UAM.setopt(pycurl.POSTFIELDS, encodedarg)
UAM.setopt(pycurl.SSL_VERIFYPEER, 0)
UAM.setopt(pycurl.UPLOAD, 1) #Set to "PUT"
UAM.setopt(pycurl.CONNECTTIMEOUT, 1)
UAM.setopt(pycurl.TIMEOUT, 2)
UAM.setopt(pycurl.WRITEFUNCTION, on_receive)
print "about to perform"
print UAM.perform()
httplib should manage.
http://docs.python.org/library/httplib.html
There's an example on this page http://effbot.org/librarybook/httplib.htm
urllib and urllib2 are also suggested.
Thank you all for your assistance. I think I have found an answer.
My code now looks like this:
import urllib
import httplib
import lxml
from lxml import etree
url='xxxx.com'
UAM=httplib.HTTPSConnection(url)
arglist= [\
('username', 'testEmailAdd#test.com'),\
('email', 'testEmailAdd#test.com'),\
('username','testUserName'),\
('givenName','testFirstName'),\
('surname','testLastName')\
]
encodedarg=urllib.urlencode(arglist)
uuid="99b47002-56e5-4fe2-9802-9a760c9fb966"
path= "/uam-rest/user/"+uuid
UAM.putrequest("PUT", path)
UAM.putheader('content-type','application/x-www-form-urlencoded')
UAM.putheader('accepts','application/com.internap.ca.uam.ama-v1+xml')
UAM.putheader("Content-Length", str(len(encodedarg)))
UAM.endheaders()
UAM.send(encodedarg)
response = UAM.getresponse()
html = etree.HTML(response.read())
result = etree.tostring(html, pretty_print=True, method="html")
print result
Updated: Now I am getting valid responses. This seems to be my solution. (The pretty print at the end isn't working, but I don't really care, that is just there while I am building the function.)
When I run this:
import urllib
feed = urllib.urlopen("http://www.yahoo.com")
print feed
I get this output in the interactive window (PythonWin):
<addinfourl at 48213968 whose fp = <socket._fileobject object at 0x02E14070>>
I'm expecting to get the source of the above URL. I know this has worked on other computers (like the ones at school) but this is on my laptop and I'm not sure what the problem is here. Also, I don't understand this error at all. What does it mean? Addinfourl? fp? Please help.
Try this:
print feed.read()
See Python docs here.
urllib.urlopen actually returns a file-like object so to retrieve the contents you will need to use:
import urllib
feed = urllib.urlopen("http://www.yahoo.com")
print feed.read()
In python 3.0:
import urllib
import urllib.request
fh = urllib.request.urlopen(url)
html = fh.read().decode("iso-8859-1")
fh.close()
print (html)
How can I find the public facing IP for my net work in Python?
This will fetch your remote IP address
import urllib
ip = urllib.urlopen('http://automation.whatismyip.com/n09230945.asp').read()
If you don't want to rely on someone else, then just upload something like this PHP script:
<?php echo $_SERVER['REMOTE_ADDR']; ?>
and change the URL in the Python or if you prefer ASP:
<%
Dim UserIPAddress
UserIPAddress = Request.ServerVariables("REMOTE_ADDR")
%>
Note: I don't know ASP, but I figured it might be useful to have here so I googled.
https://api.ipify.org/?format=json is pretty straight forward
can be parsed by just running requests.get("https://api.ipify.org/?format=json").json()['ip']
whatismyip.org is better... it just tosses back the ip as plaintext with no extraneous crap.
import urllib
ip = urllib.urlopen('http://whatismyip.org').read()
But yeah, it's impossible to do it easily without relying on something outside the network itself.
import requests
r = requests.get(r'http://jsonip.com')
# r = requests.get(r'https://ifconfig.co/json')
ip= r.json()['ip']
print('Your IP is {}'.format(ip))
Reference
If you don't mind expletives then try:
http://wtfismyip.com/json
Bind it up in the usual urllib stuff as others have shown.
There's also:
http://www.networksecuritytoolkit.org/nst/tools/ip.php
import urllib2
text = urllib2.urlopen('http://www.whatismyip.org').read()
urlRE=re.findall('[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}',text)
urlRE
['146.148.123.123']
Try putting whatever 'findmyipsite' you can find into a list and iterating through them for comparison. This one seems to work well.
This is simple as
>>> import urllib
>>> urllib.urlopen('http://icanhazip.com/').read().strip('\n')
'xx.xx.xx.xx'
You can also use DNS which in some cases may be more reliable than http methods:
#!/usr/bin/env python3
# pip install --user dnspython
import dns.resolver
resolver1_opendns_ip = False
resolver = dns.resolver.Resolver()
opendns_result = resolver.resolve("resolver1.opendns.com", "A")
for record in opendns_result:
resolver1_opendns_ip = record.to_text()
if resolver1_opendns_ip:
resolver.nameservers = [resolver1_opendns_ip]
myip_result = resolver.resolve("myip.opendns.com", "A")
for record in myip_result:
print(f"Your external ip is {record.to_text()}")
This is the python equivalent of dig +short -4 myip.opendns.com #resolver1.opendns.com