LDAP: ldap.SIZELIMIT_EXCEEDED

LDAP: ldap.SIZELIMIT_EXCEEDED - python

I am getting an ldap.SIZELIMIT_EXCEEDED error when I run this code:
import ldap
url = 'ldap://<domain>:389'
binddn = 'cn=<username> readonly,cn=users,dc=tnc,dc=org'
password = '<password>'
conn = ldap.initialize(url)
conn.simple_bind_s(binddn,password)
base_dn = "ou=People,dc=tnc,dc=org"
filter = '(objectClass=*)'
attrs = ['sn']
conn.search_s( base_dn, ldap.SCOPE_SUBTREE, filter, attrs )
Where username is my actual username, password is my actual password, and domain is the actual domain.
I don't understand why this is. Can somebody shed some light?

Manual: http://www.python-ldap.org/doc/html/ldap.html
exception ldap.SIZELIMIT_EXCEEDED
An LDAP size limit was exceeded. This
could be due to a sizelimit
configuration on the LDAP server.
I think your best bet here is to limit the sizelimit on the message you receive from the server. You can do that by setting the attribute LDAPObject.sizelimit (deprecated) or using the sizelimit parameter when using search_ext()
You should also make sure your bind was actually successful...

You're encountering that exception most likely because the server you're communicating with has more results than can be returned by a single request. In order to get around this you need to use paged results which can be done by using SimplePagedResultsControl.
Here's a Python3 implementation that I came up with after heavily editing what I found here and in the official documentation. At the time of writing this it works with the pip3 package python-ldap version 3.2.0.
def get_list_of_ldap_users():
hostname = "<domain>:389"
username = "username_here"
password = "password_here"
base = "ou=People,dc=tnc,dc=org"
print(f"Connecting to the LDAP server at '{hostname}'...")
connect = ldap.initialize(f"ldap://{hostname}")
connect.set_option(ldap.OPT_REFERRALS, 0)
connect.simple_bind_s(username, password)
search_flt = "(objectClass=*)"
page_size = 500 # how many users to search for in each page, this depends on the server maximum setting (default highest value is 1000)
searchreq_attrlist=["sn"] # change these to the attributes you care about
req_ctrl = SimplePagedResultsControl(criticality=True, size=page_size, cookie='')
msgid = connect.search_ext(base=base, scope=ldap.SCOPE_SUBTREE, filterstr=search_flt, attrlist=searchreq_attrlist, serverctrls=[req_ctrl])
total_results = []
pages = 0
while True: # loop over all of the pages using the same cookie, otherwise the search will fail
pages += 1
rtype, rdata, rmsgid, serverctrls = connect.result3(msgid)
for user in rdata:
total_results.append(user)
pctrls = [c for c in serverctrls if c.controlType == SimplePagedResultsControl.controlType]
if pctrls:
if pctrls[0].cookie: # Copy cookie from response control to request control
req_ctrl.cookie = pctrls[0].cookie
msgid = connect.search_ext(base=base, scope=ldap.SCOPE_SUBTREE, filterstr=search_flt, attrlist=searchreq_attrlist, serverctrls=[req_ctrl])
else:
break
else:
break
return total_results
This will return a list of all users but you can edit it as required to return what you want without hitting the SIZELIMIT_EXCEEDED issue :)

see here for what to do when you get this error:
How get get more search results than the server's sizelimit with Python LDAP?

The filter you provided (objectClass=*) is a presence filter. In this case it limits the results to the search request to objects in the directory at and underneath the base object you supplied - which is every object underneath the base object since every object has at least one objectClass. Restrict your search by using a more restrictive filter, or a tighter scope, or a lower base object, or all three. For more information on the topic of the search request, see Using ldapsearch and LDAP: Programming Practices.
Directory Server administrators are free to impose a server-wide limit on entries that can be returned to LDAP clients, these are known as a server-imposed size limit. There is a time limit which follows the same rules.
LDAP clients should always supply a size limit and time limit with a search request, these limits, known as client-requested limits cannot override the server-imposed limits, however.

Active Directory defaults to returning a max of 1000 results. What is sort of annoying is that rather than return 1000, with an associated error code, it seems to send the error code without the data.
eDirectory starts with no default, and is completely conifgurable to whatever you like.
Other directories handle it differently. (Edit and add in, if you know).

You must use paged search to achieve this.
The page size would depend on your ldap server, 1000 would work for Active Directory.
Have a look at http://google-apps-for-your-domain-ldap-sync.googlecode.com/svn/trunk/ldap_ctxt.py for an example

Related

How to use self.env.commit() properly?

Could someone please explain to me how self.env.cr.commit() works, when to use it and some good practices?
From Odoo documentation seems the use of cr.commit is very dangerous. This is my first time using it and I am not sure how to use it properly for my use case.
Edit:
More information for my use case: I am creating shipments through shipping provider API. Let's say my API call is successful and I have created shipment but during handling of the response, I have to raise UserError for some reason and my changes are rollbacked. So now the state of the shipment is different in Odoo and on the shipping provider server which is unacceptable.
So if I am calling the method create_dhl_shipment() and the flag variable is True (an error occurred during last API call) then I would like to delete the original shipment and create a new one.
And my problem is: How do I make a change in the database and keep it from rollbacking.
During search on the internet, I came across cr.commit() but Odoo in documentation really discourages using it.
very simplified example:
Class StockPickingInherited(models.Model)
_inherit = 'stock.picking'
remnant_shipment = fields.Boolean("Possible remnant shipment")
packages = fields.One2many("stock.shipment.package", "picking_id")
def create_dhl_shipment(self):
response_from_shipping_provider = requests.get("API URL")
if response_from_shipping_provider != 200:
if not remnant_shipment:
raise UserError("Shipment creation failed")
else:
self.write({"packages" : [(5, 0, 0)]})
self.env.cr.commit() # write data into the db and keep the change from rollbacking due to raising UserError
raise UserError("Shipment creation failed")
Am I doing it right? Are there some potential dangers?

It is true that self.env.cr.commit() should be used sparingly. However, there are some legitimate use cases for it. For example:
#api.model
def some_cron_job(self):
for record in self.env[...].search(...):
record.do_some_process()
self.env.cr.commit()
The above is fine because you are doing some process on a batch of records and to avoid that the cron job does it over and over because of an error on one of the records, you can commit after each record has been processed. (PS: the above could be made safer with try...except... or marking a record as "failed" for example. This can be done in conjunction with commit())
In your use case you want to display a message to the user, but your problem is that once you raise the transaction will rollback, so to counter this you call a commit().
I have had similar situations and I sometimes find that using an #api.onchange works well for showing a message to the user.
#api.onchange('your_field')
def onchange_your_field(self):
if self.your_field > 100:
raise UserError("Thanks for entering the field")
However, I am no longer a fan of that approach since Odoo has gotten rid of most #api.onchange in favour of computed fields. (There are good reasons for the move, so try and avoid it too.)
Luckily there is a new way to do this, introduced in Odoo Version 13.0.
def some_method(self):
# Do something
self.write({"something" : False})
# Display message
return {
'type': 'ir.actions.client',
'tag': 'display_notification',
'params': {
'title': title,
'message': message,
'sticky': False,
}
}
The above is taken from here in the Odoo repo where a message is displayed to the user if the mail server credentials are correct.

The following rule is taken from the Odoo guidelines (never commit the transaction):
You should NEVER call cr.commit() yourself, UNLESS you have created your own database cursor explicitly! And the situations where you need to do that are exceptional!And by the way if you did create your own cursor, then you need to handle error cases and proper rollback, as well as properly close the cursor when you’re done with it.
In the following example, we create a new cursor to avoid rollback that could be caused by an upper method:
registry = odoo.registry(self.env.cr.dbname)
with registry.cursor() as cr:
env = api.Environment(cr, SUPERUSER_ID, {})
For more details check the well-documented Odoo guidelines
You can find an example in auth_ldap module

Handling optional Parameters in Python

I'm trying to solve an issue related to my api and want to refactor my code to work backward, i mean if the front-end doesn't send me the data i want, the request should go through , and if the front-end does send, it would still work as usual.
So roughly my create function works fine however , when the front end team sent data without what the server expects it does break with a 500 Internal Server Error, but i want to make it optional, even though the data expected is not sent , i want to get a 200 Http response, here is where the code breaks because of the key error on the job_invoice.I've tried to use in my for loop the break to bypass it .. but still.
job_invoice_data = inv_data['job_invoice']
job_invoice = JobInvoice.objects.create(job=job_instance, **job_invoice_data)
obj.job_invoice = job_invoice
# Create an InvoiceLineItem for each element in invoice_line_item
for invoice_line_item in inv_data['invoice_line_item']:
invoice_line_item['job_invoice'] = job_invoice.id
if invoice_line_item['job_invoice'] is None:
break
invoice_line_item_serializer = InvoiceLineItemSerializer(data=invoice_line_item)
if invoice_line_item_serializer.is_valid():
invoice_line_item_obj = invoice_line_item_serializer.save()
else:
logger.debug("Couldn't create invoice line item: {}".format(invoice_line_item))

Simple Solution: Use Try and Catch Blocks
try:
#Parse all parameters here
except:
pass
More Complex:
Check availability of the parameters and proceed only when its available but use the default value none.
For example in Flask you can do something like this:
#app.route("/func/<required_param>", defaults={"opt1": None, "opt2": None,"opt3": None})

GAE REQUEST_LOG_ID env variable is wrong

Using the code below I'm sending an email on error. I'm trying to include a link to the Cloud Console logs in the email but the request ID seems to be wrong about 30% of the time.
If I find the request with the wrong ID it's always almost a perfect match except the last three characters are 0 (in the Stackdriver console) instead of 101 (returned from the env variable), always the same substitution - is this a bug with cloud console or am I trying to use these IDs wrong?
The code (stripped down version):
class ErrorAlertMiddleware(object):
def process_response(self, request, response):
if response.status_code == 500:
logger.info(os.environ.get('REQUEST_LOG_ID'))
msg = 'Link to logs: https://console.cloud.google.com/logs/viewer?' + '&'.join((
'project=%s' % MY_APP_ID,
'expandAll=true',
'filters=request_id:%s' % os.environ.get('REQUEST_LOG_ID'),
'resource=gae_app',
))
# this is a utility func that simply sends email
sendemail(ERROR_RECIPIENT, msg)
return response
Note I've also logged the REQUEST_LOG_ID to ensure it's not being encoded or something and the log output matches what shows in the link

Instead of os.environ.get('REQUEST_LOG_ID'), use request.environ.get('REQUEST_LOG_ID').
It may be possible that os.environ['REQUEST_LOG_ID'] changes between the start of the current request and the time you access it, but request.environ['REQUEST_LOG_ID'] should not change once the request is initialized. The docs state that if one request ID is greater than another, than it occurred later than the other. This implies that the requestID in the Stackdriver console was generated before the one in your email link. This makes me think that somewhere along the line, os.environ['REQUEST_LOG_ID'] is being updated from '....000' to '....101' before you access it, while the copy in request.environ['REQUEST_LOG_ID'] should remain unchanged.
For more info on the request.environ, take a look at the source code in google.appengine.runtime.request_environment.py. I haven't really found documentation on that, but that code led me to believe that the os.environ is not as safe to access as the request.environ.

Find out if the current machine is on aws in python

I have a python script that runs on aws machines, as well as on other machines.
The functionality of the script depends on whether or not it is on AWS.
Is there a way to programmatically discover whether or not it runs on AWS? (maybe using boto?)

If you want to do that strictly using boto, you could do:
import boto.utils
md = boto.utils.get_instance_metadata(timeout=.1, num_retries=0)
The timeout specifies the how long the HTTP client will wait for a response before timing out. The num_retries parameter controls how many times the client will retry the request before giving up and returning and empty dictionary.

you can easily use the AWS SDK and check for instance id.
beside of that, you can check the aws ip ranges - check out this link
https://forums.aws.amazon.com/ann.jspa?annID=1701

I found a way, using:
try:
instance_id_resp = requests.get('http://169.254.169.254/latest/meta-data/instance-id')
is_on_aws = True
except requests.exceptions.ConnectionError as e:
is_on_awas = False

I tried some of the above, and when not running on Amazon I had troubles accessing 169.254.169.254. Maybe it has something to do with the fact I'm outside the US.
In any case, here's a piece of code that worked for me:
def running_on_amazon():
import urllib2
import socket
# I'm using curlmyip.com, but there are other websites that provide the same service
ip_finder_addr = "http://curlmyip.com"
f = urllib2.urlopen(ip_finder_addr)
my_ip = f.read(100).strip()
host_addr = socket.gethostbyaddr(my_ip)
my_public_name = host_addr[0]
amazon = (my_public_name.find("aws") >=0 )
return amazon # returns a boolean value.

Send a "304 Not Modified" for images stored in the datastore

I store user-uploaded images in the Google App Engine datastore as db.Blob, as proposed in the docs. I then serve those images on /images/<id>.jpg.
The server always sends a 200 OK response, which means that the browser has to download the same image multiple time (== slower) and that the server has to send the same image multiple times (== more expensive).
As most of those images will likely never change, I'd like to be able to send a 304 Not Modified response. I am thinking about calculating some kind of hash of the picture when the user uploads it, and then use this to know if the user already has this image (maybe send the hash as an Etag?)
I have found this answer and this answer that explain the logic pretty well, but I have 2 questions:
Is it possible to send an Etag in Google App Engine?
Has anyone implemented such logic, and/or is there any code snippet available?

Bloggart uses this technique. Have a look at this blog post.
class StaticContentHandler(webapp.RequestHandler):
def output_content(self, content, serve=True):
self.response.headers['Content-Type'] = content.content_type
last_modified = content.last_modified.strftime(HTTP_DATE_FMT)
self.response.headers['Last-Modified'] = last_modified
self.response.headers['ETag'] = '"%s"' % (content.etag,)
if serve:
self.response.out.write(content.body)
else:
self.response.set_status(304)
def get(self, path):
content = get(path)
if not content:
self.error(404)
return
serve = True
if 'If-Modified-Since' in self.request.headers:
last_seen = datetime.datetime.strptime(
self.request.headers['If-Modified-Since'],
HTTP_DATE_FMT)
if last_seen >= content.last_modified.replace(microsecond=0):
serve = False
if 'If-None-Match' in self.request.headers:
etags = [x.strip('" ')
for x in self.request.headers['If-None-Match'].split(',')]
if content.etag in etags:
serve = False
self.output_content(content, serve)

There might be a simpler solution here. This requires that you never overwrite the data associated with any identifier, e.g. modifying the image would create a new id (and hence a new URL).
Simply set the Expires header from your request handler to the far future, e.g. now + a year. This would result in clients caching the image and not asking for an update until that time comes.
This approach has some tradeoffs, like ensuring new URLs are embedded when images are modified, so you have to decide for yourself. What jbochi is proposing is the other alternative that puts more logic into the image request handler.

By the way, thanks to webob, webapp.RequestHandler provides easy way to check If-None-Match.
if etag in self.request.if_none_match:
pass # do something

why would the code use this:
self.response.headers['ETag'] = '"%s"' % (content.etag,)
instead of this:
self.response.headers['ETag'] = '"%s"' % content.etag
I think it is the same and will use the 2nd unless someone explains the reasoning.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

LDAP: ldap.SIZELIMIT_EXCEEDED - python

see here for what to do when you get this error: How get get more search results than the server's sizelimit with Python LDAP?

You must use paged search to achieve this. The page size would depend on your ldap server, 1000 would work for Active Directory. Have a look at http://google-apps-for-your-domain-ldap-sync.googlecode.com/svn/trunk/ldap_ctxt.py for an example

Related

How to use self.env.commit() properly?

Handling optional Parameters in Python

GAE REQUEST_LOG_ID env variable is wrong

Find out if the current machine is on aws in python

Send a "304 Not Modified" for images stored in the datastore

Categories

Resources