Python-ldap ldap.initialize rejects a URL that ldapurl considers valid - python

I want to open a connection to a ldap directory using ldap url that will be given at run time. For example :
ldap://192.168.2.151/dc=directory,dc=example,dc=com
It is valid as far as I can tell. Python-ldap url parser ldapurl.LDAPUrl accepts it.
url = 'ldap://192.168.2.151/dc=directory,dc=example,dc=com'
parsed_url = ldapurl.LDAPUrl(url)
parsed_url.dn
'dc=directory,dc=example,dc=com'
But if I use it to initialize a LDAPObject, I get a ldap.LDAPError exception
ldap.initialize(url)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/ldap/functions.py", line 91, in initialize
return LDAPObject(uri,trace_level,trace_file,trace_stack_limit)
File "/usr/lib/python2.7/dist-packages/ldap/ldapobject.py", line 70, in __init__
self._l = ldap.functions._ldap_function_call(ldap._ldap_module_lock,_ldap.initialize,uri)
File "/usr/lib/python2.7/dist-packages/ldap/functions.py", line 63, in _ldap_function_call
result = func(*args,**kwargs)
ldap.LDAPError: (0, 'Error')
I found that if I manually encode the dn part of the url, it works :
url = 'ldap://192.168.2.151/dc=directory%2cdc=example%2cdc=com'
#url still valid
parsed_url = ldapurl.LDAPUrl(url)
parsed_url.dn
'dc=directory,dc=example,dc=com'
#and will return a valid connection
ldap.initialize(url)
<ldap.ldapobject.SimpleLDAPObject instance at 0x1400098>
How can I ensure robust url handling in ldap.initialize without encoding parts of the url myself ? (which, I'm afraid, won't be that robust anyway).

You can programatically encode the last part of the URL:
from urllib import quote # works in Python 2.x
from urllib.parse import quote # works in Python 3.x
url = 'ldap://192.168.2.151/dc=directory,dc=paralint,dc=com'
idx = url.rindex('/') + 1
url[:idx] + quote(url[idx:], '=')
=> 'ldap://192.168.2.151/dc=directory%2Cdc=paralint%2Cdc=com'

One can use LDAPUrl.unparse() method to get a properly encoded version of the URI, like this :
>>> import ldapurl
>>> url = ldapurl.LDAPUrl('ldap://192.168.2.151/dc=directory,dc=example,dc=com')
>>> url.unparse()
'ldap://192.168.2.151/dc%3Ddirectory%2Cdc%3Dparalint%2Cdc%3Dcom???'
>>> ldap.initialize(url.unparse())
<ldap.ldapobject.SimpleLDAPObject instance at 0x103d998>
And LDAPUrl.unparse() will not reencode an already encoded url :
>>> url = ldapurl.LDAPUrl('ldap://example.com/dc%3Dusers%2Cdc%3Dexample%2Cdc%3Dcom%2F???')
>>> url.unparse()
'ldap://example.com/dc%3Dusers%2Cdc%3Dexample%2Cdc%3Dcom%2F???'
So you can use it blindly on any ldap uri your program must handle.

Related

urlib.request.urlopen not accepting query string with spaces

I am taking a udacity course on python where we are supposed to check for profane words in a document. I am using the website http://www.wdylike.appspot.com/?q= (text_to_be_checked_for_profanity). The text to be checked can be passed as a query string in the above URL and the website would return a true or false after checking for profane words. Below is my code.
import urllib.request
# Read the content from a document
def read_content():
quotes = open("movie_quotes.txt")
content = quotes.read()
quotes.close()
check_profanity(content)
def check_profanity(text_to_read):
connection = urllib.request.urlopen("http://www.wdylike.appspot.com/?q="+text_to_read)
result = connection.read()
print(result)
connection.close
read_content()
It gives me the following error
Traceback (most recent call last):
File "/Users/Vrushita/Desktop/Rishit/profanity_check.py", line 21, in <module>
read_content()
File "/Users/Vrushita/Desktop/Rishit/profanity_check.py", line 11, in read_content
check_profanity(content)
File "/Users/Vrushita/Desktop/Rishit/profanity_check.py", line 16, in check_profanity
connection = urllib.request.urlopen("http://www.wdylike.appspot.com/?q="+text_to_read)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 472, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 510, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 444, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
The document that I am trying to read the content from contains a string "Hello world" However, if I change the string to "Hello+world", the same code works and returns the desired result. Can someone explain why this is happening and what is a workaround for this?
urllib accepts it, the server doesn't. And well it should not, because a space is not a valid URL character.
Escape your query string properly with urllib.parse.quote_plus(); it'll ensure your string is valid for use in query parameters. Or better still, use the urllib.parse.urlencode() function to encode all key-value pairs:
from urllib.parse import urlencode
params = urlencode({'q': text_to_read})
connection = urllib.request.urlopen(f"http://www.wdylike.appspot.com/?{params}")
The below response is for python 3.*
400 Bad request occurs when there is space within your input text.
To avoid this use parse.
so import it.
from urllib import request, parse
If you are sending any text along with the url then parse the text.
url = "http://www.wdylike.appspot.com/?q="
url = url + parse.quote(input_to_check)
Check the explanation here - https://discussions.udacity.com/t/problem-in-profanity-with-python-3-solved/227328
The Udacity profanity checker program -
from urllib import request, parse
def read_file():
fhand = open(r"E:\Python_Programming\Udacity\movie_quotes.txt")
file_content = fhand.read()
#print (file_content)
fhand.close()
profanity_check(file_content)
def profanity_check(input_to_check):
url = "http://www.wdylike.appspot.com/?q="
url = url + parse.quote(input_to_check)
req = request.urlopen(url)
answer = req.read()
#print(answer)
req.close()
if b"true" in answer:
print ("Profanity Alret!!!")
else:
print ("Nothing to worry")
read_file()
I think this code is closer to what the Lesson was aiming to, inferencing the difference between native functions, classes and functions inside classes:
from urllib import request, parse
def read_text():
quotes = open('C:/Users/Alejandro/Desktop/movie_quotes.txt', 'r+')
contents_of_file = quotes.read()
print(contents_of_file)
check_profanity(contents_of_file)
quotes.close()
def check_profanity(text_to_check):
connection = request.urlopen('http://www.wdylike.appspot.com/?q=' + parse.quote(text_to_check))
output = connection.read()
# print(output)
connection.close()
if b"true" in output:
print("Profanity Alert!!!")
elif b"false" in output:
print("This document has no curse words!")
else:
print("Could not scan the document properly")
read_text()
I'm working on the same project also using Python 3 like the most.
While looking for the solution in Python 3, I found this HowTo, and I decided to give it a try.
It seems that on some websites, including Google, connections through programming code (for example, via the urllib module), sometimes does not work properly. Apparently this has to do with the User Agent, which is recieved by the website when building the connection.
I did some further researches and came up with the following solution:
First I imported URLopener from urllib.request and created a class called ForceOpen as a subclass of URLopener.
Now I could create a "regular" User Agent by setting the variable version inside the ForceOpen class. Then just created an instance of it and used the open method in place of urlopen to open the URL.
(It works fine, but I'd still appreciate comments, suggestions or any feedback, also because I'm not absolute sure, if this way is a good alternative - many thanks)
from urllib.request import URLopener
class ForceOpen(URLopener): # create a subclass of URLopener
version = "Mozilla/5.0 (cmp; Konqueror ...)(Kubuntu)"
force_open = ForceOpen() # create an instance of it
def read_text():
quotes = open(
"/.../profanity_editor/data/quotes.txt"
)
contents_of_file = quotes.read()
print(contents_of_file)
quotes.close()
check_profanity(contents_of_file)
def check_profanity(text_to_check):
# now use the open method to open the URL
connection = force_open.open(
"http://www.wdylike.appspot.com/?q=" + text_to_check
)
output = connection.read()
connection.close()
if b"true" in output:
print("Attention! Curse word(s) have been detected.")
elif b"false" in output:
print("No curse word(s) found.")
else:
print("Error! Unable to scan document.")
read_text()

Obfuscate password in url

I want to obscure a password in a URL for logging purposes. I was hoping to use urlparse, by parsing, replacing password with dummy password, and unparsing, but this is giving me:
>>> from urllib.parse import urlparse
>>> parts = urlparse('https://user:pass#66.66.66.66/aaa/bbb')
>>> parts.password = 'xxx'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: can't set attribute
So the alternative seems to be this, which seems overkill.
Is there an easier way of replacing the password, using the standard library?
urlparse returns a (subclass of) named tuple. Use the namedtuple._replace() method to produce a new copy, and use geturl() on that to 'unparse'.
The password is part of the netloc attribute, which can be parsed further:
from urllib.parse import urlparse
def replace_password(url):
parts = urlparse(url)
if parts.password is not None:
# split out the host portion manually. We could use
# parts.hostname and parts.port, but then you'd have to check
# if either part is None. The hostname would also be lowercased.
host_info = parts.netloc.rpartition('#')[-1]
parts = parts._replace(netloc='{}:xxx#{}'.format(
parts.username, host_info))
url = parts.geturl()
return url
Demo:
>>> replace_password('https://user:pass#66.66.66.66/aaa/bbb')
'https://user:xxx#66.66.66.66/aaa/bbb'

urllib3.urlencode googlescholar url from string

I am trying to encode a string to url to search google scholar, soon to realize, urlencode is not provided in urllib3.
>>> import urllib3
>>> string = "https://scholar.google.com/scholar?" + urllib3.urlencode( {"q":"rudra banerjee"} )
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'urlencode'
So, I checked urllib3 doc and found, I possibly need request_encode_url. But I have no experience in using that and failed.
>>> string = "https://scholar.google.com/scholar?" +"rudra banerjee"
>>> url = urllib3.request_encode_url('POST',string)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'request_encode_url'
So, how I can encode a string to url?
NB I don't have any particular fascination to urllib3. so, any other module will also do.
To simply encode fields in a URL, you can use urllib.urlencode.
In Python 2, this should do the trick:
import urllib
s = "https://scholar.google.com/scholar?" + urllib.urlencode({"q":"rudra banerjee"})
print(s)
# Prints: https://scholar.google.com/scholar?q=rudra+banerjee
In Python 3, it lives under urllib.parse.urlencode instead.
(Edit: I assumed you wanted to download the URL, not simply encode it. My mistake. I'll leave this answer as a reference for others, but see the other answer for encoding a URL.)
If you pass a dictionary into fields, urllib3 will take care of encoding it for you. First, you'll need to instantiate a pool for your connections. Here's a full example:
import urllib3
http = urllib3.PoolManager()
r = http.request('POST', 'https://scholar.google.com/scholar', fields={"q":"rudra banerjee"})
print(r.data)
Calling .request(...) will take care of figuring out the encoding for you based on the method.
Getting started examples are here: https://urllib3.readthedocs.org/en/latest/index.html#usage

UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode

I know many people encountered this error before but I couldn't find the solution to my problem.
I have a URL that I want to normalize:
url = u"http://www.dgzfp.de/Dienste/Fachbeitr%C3%A4ge.aspx?EntryId=267&Page=5"
scheme, host_port, path, query, fragment = urlsplit(url)
path = urllib.unquote(path)
path = urllib.quote(path,safe="%/")
This gives an error message:
/usr/lib64/python2.6/urllib.py:1236: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal
res = map(safe_map.__getitem__, s)
Traceback (most recent call last):
File "url_normalization.py", line 246, in <module>
logging.info(get_canonical_url(url))
File "url_normalization.py", line 102, in get_canonical_url
path = urllib.quote(path,safe="%/")
File "/usr/lib64/python2.6/urllib.py", line 1236, in quote
res = map(safe_map.__getitem__, s)
KeyError: u'\xc3'
I tried to remove the unicode indicator "u" from the URL string and I do not get the error message. But How can I get rid of the unicode automatically because I read it directly from a database.
urllib.quote() does not properly parse Unicode. To get around this, you can call the .encode() method on the url when reading it (or on the variable you read from the database). So run url = url.encode('utf-8'). With this you get:
import urllib
import urlparse
from urlparse import urlsplit
url = u"http://www.dgzfp.de/Dienste/Fachbeitr%C3%A4ge.aspx?EntryId=267&Page=5"
url = url.encode('utf-8')
scheme, host_port, path, query, fragment = urlsplit(url)
path = urllib.unquote(path)
path = urllib.quote(path,safe="%/")
and then your output for the path variable will be:
>>> path
'/Dienste/Fachbeitr%C3%A4ge.aspx'
Does this work?

How to encode my url parameter in google app engine?

I do geocoding with python and I think I need to encode the variable region with urllencode so that it works with content that has whitespace and other special characters:
url = urllib.urlencode('http://maps.googleapis.com/maps/api/geocode/json?address='+region+'&sensor=false')
logging.info('url:'+url)
result = urlfetch.fetch(url)
It generates an error log when the variable region contains a whitespace
Traceback (most recent call last):
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~montaoproject/pricehandling.355268396595012751/in.py", line 153, in get
url = urllib.urlencode('http://maps.googleapis.com/maps/api/geocode/json?address='+region+'&sensor=false')
File "/base/python27_runtime/python27_dist/lib/python2.7/urllib.py", line 1275, in urlencode
raise TypeError
TypeError: not a valid non-string sequence or mapping object
The background is another question I asked where I had I problem that I'm troublehhotting to be that the code works but not for regions that are two or more words ie names with whitespaces.
https://stackoverflow.com/questions/8441063/how-should-i-use-urlfetch-here
On production I used another variable. I thought it did not matter that it had whitespace. When I try variables that do not contain whitespace it works.
So could you please tell me how I should encode the url variable to admit whitespace and other "special" characters?
Thank you
Just encode your querystring part
Like:
param = {"address" : region,
"sensor" : "false"
}
or
param = [("address", region), ("sensor", "false")]
then
encoded_param = urllib.urlencode(param)
url = 'http://maps.googleapis.com/maps/api/geocode/json'
url = url + '?' + encoded_param
result = urlfetch.fetch(url)
Use urllib.pathname2url, it works directly on a single string value, no need for a dictionary

Categories