Python Dict Switcher results in memory leak - python

I have extensively read about immutable and mutable objects in Python for a couple of months now and I seem to begin to understand the concept. Still I cannot spot the problem why my code below produces memory leaks. The dicts function as references to immutable records of specific type. In many cases, I get an update of an existing record, in this case, the existing record will only be updated if the two records (oldrecord and newrecord) are not equal. However, I have the feeling that newrecord gets never deleted if oldrecord and newrecord match, although all references appear to cease to exist in such a case.
My question:
Is the code below good practice for selecting a reference to a dict based on record type or should I do it differently (e.g. through dictSwitcher)?
class myRecordDicts():
def __init__(self, type1Dict=dict(), type2Dict=dict(),
type3Dict=dict(),type4Dict=dict(),type5Dict=dict(),type6Dict=dict(), type7Dict=dict()):
self.type1Dict = type1Dict
self.type2Dict = type2Dict
self.type3Dict = type3Dict
self.type4Dict = type4Dict
self.type5Dict = type5Dict
self.type6Dict = type6Dict
self.type7Dict = type7Dict
def dictSelector(self, record):
dictSwitcher = {
myCustomRecordType1().name: self.type1Dict,
myCustomRecordType2().name: self.type2Dict,
myCustomRecordType3().name: self.type3Dict,
myCustomRecordType4().name: self.type4Dict,
myCustomRecordType5().name: self.type5Dict,
myCustomRecordType6().name: self.type6Dict,
myCustomRecordType7().name: self.type7Dict,
}
return dictSwitcher.get(record.name)
def AddRecordToDict(self, newrecord):
dict = self.dictSelector(newrecord)
recordID = newrecord.id
if recordID in dict:
oldrecord = dict[recordID]
self.MergeExistingRecords(oldrecord,newrecord)
else:
dict[recordID] = newrecord
def MergeExistingRecords(self, oldrecord, newrecord):
# Basic Compare function
oldRecordString = oldrecord.SerializeToString()
newRecordString = newrecord.SerializeToString()
# no need to do anything if same length
if not len(oldRecordString) == len(newRecordString):
oldrecord.CustomMergeFrom(newrecord)

Well, it seems always like that: I was working for hours on this problem and could not make progress. 5 Minutes after formulating the question correctly on StackExchange, I found my issue:
I needed to remove the references in init since I was never passing dicts when instantiating myRecordsDicts(), the following code does not leak memory:
class myRecordDicts():
def __init__(self):
self.type1Dict = dict()
self.type2Dict = dict()
self.type3Dict = dict()
self.type4Dict = dict()
self.type5Dict = dict()
self.type6Dict = dict()
self.type7Dict = dict()

Related

Method __init__ has too many parameters

I'm super new to Python (I started about 3 weeks ago) and I'm trying to make a script that scrapes web pages for information. After it's retrieved the information it runs through a function to format it and then passes it to a class that takes 17 variables as parameters. The class uses this information to calculate some other variables and currently has a method to construct a dictionary. The code works as intended but a plugin I'm using with Pycharm called SonarLint highlights that 17 variables is too many to use as parameters?
I've had a look for alternate ways to pass the information to the class, such as in a tuple or a list but couldn't find much information that seemed relevant. What's the best practice for passing many variables to a class as parameters? Or shouldn't I be using a class for this kind of thing at all?
I've reduced the amount of variables and code for legibility but here is the class;
Class GenericEvent:
def __init__(self, type, date_scraped, date_of_event, time, link,
blurb):
countdown_delta = date_of_event - date_scraped
countdown = countdown_delta.days
if countdown < 0:
has_passed = True
else:
has_passed = False
self.type = type
self.date_scraped = date_scraped
self.date_of_event = date_of_event
self.time = time
self.link = link
self.countdown = countdown
self.has_passed = has_passed
self.blurb = blurb
def get_dictionary(self):
event_dict = {}
event_dict['type'] = self.type
event_dict['scraped'] = self.date_scraped
event_dict['date'] = self.date_of_event
event_dict['time'] = self.time
event_dict['url'] = self.link
event_dict['countdown'] = self.countdown
event_dict['blurb'] = self.blurb
event_dict['has_passed'] = self.has_passed
return event_dict
I've been passing the variables as key:value pairs to the class after I've cleaned up the data the following way:
event_info = GenericEvent(type="Lunar"
date_scraped=30/01/19
date_of_event=28/07/19
time=12:00
link="www.someurl.com"
blurb="Some string.")
and retrieving a dictionary by calling:
event_info.get_dictionary()
I intend to add other methods to the class to be able to perform other operations too (not just to create 1 dictionary) but would like to resolve this before I extend the functionality of the class.
Any help or links would be much appreciated!
One option is a named tuple:
from typing import Any, NamedTuple
class GenericEvent(NamedTuple):
type: Any
date_scraped: Any
date_of_event: Any
time: Any
link: str
countdown: Any
blurb: str
#property
def countdown(self):
countdown_delta = date_of_event - date_scraped
return countdown_delta.days
#property
def has_passed(self):
return self.countdown < 0
def get_dictionary(self):
return {
**self._asdict(),
'countdown': self.countdown,
'has_passed': self.has_passed,
}
(Replace the Anys with the fields’ actual types, e.g. datetime.datetime.)
Or, if you want it to be mutable, a data class.
I don't think there's anything wrong with what you're doing. You could, however, take your parameters in as a single dict object, and then deal with them by iterating over the dict or doing something explicitly with each one. Seems like that would, in your case, make your code messier.
Since all of your parameters to your constructor are named parameters, you could just do this:
def __init__(self, **params):
This would give you a dict named params that you could then process. The keys would be your parameter names, and the values the parameter values.
If you aligned your param names with what you want the keys to be in your get_dictionary method's return value, saving off this parameter as a whole could make that method trivial to write.
Here's an abbreviated version of your code (with a few syntax errors fixed) that illustrates this idea:
from pprint import pprint
class GenericEvent:
def __init__(self, **params):
pprint(params)
event_info = GenericEvent(type="Lunar",
date_scraped="30/01/19",
date_of_event="28/07/19",
time="12:00",
link="www.someurl.com",
blurb="Some string.")
Result:
{'blurb': 'Some string.',
'date_of_event': '28/07/19',
'date_scraped': '30/01/19',
'link': 'www.someurl.com',
'time': '12:00',
'type': 'Lunar'}

Improving code for similar code found

I passed codeclimate to my code, and I obtained the following:
Similar code found in 1 other location
This is my code:
stradd = 'iterable_item_added'
if stradd in ddiff:
added = ddiff[stradd]
npos_added = parseRoots(added)
dics_added = makeAddDicts(localTable, pk, npos_added)
else:
dics_added = []
strchanged = 'values_changed'
if strchanged in ddiff:
updated = ddiff[strchanged]
npos_updated = parseRoots(updated)
dics_updated = makeUpdatedDicts(localTable, pk, npos_updated)
else:
dics_updated = []
Where iterable_item_added and values_changed are repeated. How to change it?
just abstract the parameters and create an helper method:
def testmethod(name,localTable,m,ddiff,pk):
if name in ddiff:
npos = parseRoots(ddiff[name])
rval = m(localTable, pk, npos)
else:
rval = []
return rval
the call it:
dics_added = testmethod('iterable_item_added',localTable,makeAddDicts,ddiff,pk)
dics_updated = testmethod('values_changed',localTable,makeUpdatedDicts,ddiff,pk)
note: be careful when factorizing code, you can introduce bugs (and make code better readable :)).
Also: that helper method forces to pass a lot of local variables. Maybe creating an object and member variables would simplify even more.
In that case, it appears to be a bit "overkill" to do that in order to make your review tool shut up.

how to get class instances out of a dict in python

I have a class `Collection' that looks like this:
class Collection():
def __init__(self, db, collection_name):
self.db = db
self.collection_name = collection_name
if not hasattr(self.__class__, 'client'):
self.__class__.client = MongoClient()
self.data_base = getattr(self.client, self.db)
self.collection = getattr(self.data_base, self.collection_name)
def getCollectionKeys(self):
....etc.
I cleverly created a function to create a dictionary of class instances as follows:
def getCollections():
collections_dict = {}
for i in range(len(db_collection_names)):
collections_dict[db_collection_names[i]] = Collection(database_name, db_collection_names[i])
return collections_dict
it works. however, whenever I want to access a class instance, I have to go through the dictionary:
agents_keys = collections_dict['agents'].getCollectionKeys()
I would love to just write:
agents_keys = agents.getCollectionKeys()
Is there a simple way to get those instances "out" of the dict?
You can get a reference to items in a vanilla python dictionary using a generator object in a for loop, or by using a list expression.
agent_keys = [x.getCollectionKeys() for x in collections_dict.values()]
or this
agent_keys = []
for name in db_collection_names:
#do something with individual item
#there could also be some logic here about which keys to append
agent_keys.append(collections_dict[name].getCollectionKeys())
#now agent_keys is full of all the keys
My mental model of how objects are interacted with in python. Feel free to edit if you actually know how it works.
You cannot "take" items of the dictionary per say unless you call the del operator which removes the association of a variable name (that is what you type in the editor like "foo" and "bar") with an object ( the actual collections of bits in the program your machine sees). What you can do is get a reference to the object, which in python is a symbol that for all your intents and purposes is the object you want.
The dictionary just holds a bunch of references to your database objects.
The expression collections_dict['agents'] is equivalent to your original database object that you put into the dictionary like this
collections_dict['agents'] = my_particular_object

how to get entities which don't have certain attribute in datastore

I'm trying to make an appraisal system
This is my class
class Goal(db.Expando):
GID = db.IntegerProperty(required=True)
description = db.TextProperty(required=True)
time = db.FloatProperty(required=True)
weight = db.IntegerProperty(required=True)
Emp = db.UserProperty(auto_current_user=True)
Status = db.BooleanProperty(default=False)
Following things are given by employee,
class SubmitGoal(webapp.RequestHandler):
def post(self):
dtw = simplejson.loads(self.request.body)
try:
maxid = Goal.all().order("-GID").get().GID + 1
except:
maxid = 1
try:
g = Goal(GID=maxid, description=dtw[0], time=float(dtw[1]), weight=int(dtw[2]))
g.put()
self.response.out.write(simplejson.dumps("Submitted"))
except:
self.response.out.write(simplejson.dumps("Error"))
Now, here Manager checks the goals and approve it or not.. if approved then status will be stored as true in datastore else false
idsta = simplejson.loads(self.request.body)
try:
g = db.Query(Goal).filter("GID =", int(idsta[0])).get()
if g:
if idsta[1]:
g.Status=True
try:
del g.Comments
except:
None
else:
g.Status=False
g.Comments=idsta[2]
db.put(g)
self.response.out.write(simplejson.dumps("Submitted"))
except:
self.response.out.write(simplejson.dumps("Error"))
Now, this is where im stuck..."filter('status=',True)".. this is returning all the entities which has status true.. means which are approved.. i want those entities which are approved AND which have not been assessed by employee yet..
def get(self):
t = []
for g in Goal.all().filter("Status = ",True):
t.append([g.GID, g.description, g.time, g.weight, g.Emp])
self.response.out.write(simplejson.dumps(t))
def post(self):
idasm = simplejson.loads(self.request.body)
try:
g = db.Query(Goal).filter("GID =", int(idasm[0])).get()
if g:
g.AsmEmp=idasm[1]
db.put(g)
self.response.out.write(simplejson.dumps("Submitted"))
except:
self.response.out.write(simplejson.dumps("Error"))
How am I supposed to do this? as I know that if I add another filter like "filter('AsmEmp =', not None)" this will only return those entities which have the AsmEmp attribute what I need is vice versa.
You explicitly can't do this. As the documentation states:
It is not possible to query for entities that are missing a given property.
Instead, create a property for is_assessed which defaults to False, and query on that.
could you not simply add another field for when employee_assessed = db.user...
and only populate this at the time when it is assessed?
The records do not lack the attribute in the datastore, it's simply set to None. You can query for those records with Goal.all().filter('status =', True).filter('AsmEmp =', None).
A few incidental suggestions about your code:
'Status' is a rather unintuitive name for a boolean.
It's generally good Python style to begin properties and attributes with a lower-case letter.
You shouldn't iterate over a query directly. This fetches results in batches, and is much less efficient than doing an explicit fetch. Instead, fetch the number of results you need with .fetch(n).
A try/except with no exception class specified and no action taken when an exception occurs is a very bad idea, and can mask a wide variety of issues.
Edit: I didn't notice that you were using an Expando - in which case #Daniel's answer is correct. There doesn't seem to be any good reason to use Expando here, though. Adding the property to the model (and updating existing entities) would be the easiest solution here.

Copy an entity in Google App Engine datastore in Python without knowing property names at 'compile' time

In a Python Google App Engine app I'm writing, I have an entity stored in the datastore that I need to retrieve, make an exact copy of it (with the exception of the key), and then put this entity back in.
How should I do this? In particular, are there any caveats or tricks I need to be aware of when doing this so that I get a copy of the sort I expect and not something else.
ETA: Well, I tried it out and I did run into problems. I would like to make my copy in such a way that I don't have to know the names of the properties when I write the code. My thinking was to do this:
#theThing = a particular entity we pull from the datastore with model Thing
copyThing = Thing(user = user)
for thingProperty in theThing.properties():
copyThing.__setattr__(thingProperty[0], thingProperty[1])
This executes without any errors... until I try to pull copyThing from the datastore, at which point I discover that all of the properties are set to None (with the exception of the user and key, obviously). So clearly this code is doing something, since it's replacing the defaults with None (all of the properties have a default value set), but not at all what I want. Suggestions?
Here you go:
def clone_entity(e, **extra_args):
"""Clones an entity, adding or overriding constructor attributes.
The cloned entity will have exactly the same property values as the original
entity, except where overridden. By default it will have no parent entity or
key name, unless supplied.
Args:
e: The entity to clone
extra_args: Keyword arguments to override from the cloned entity and pass
to the constructor.
Returns:
A cloned, possibly modified, copy of entity e.
"""
klass = e.__class__
props = dict((k, v.__get__(e, klass)) for k, v in klass.properties().iteritems())
props.update(extra_args)
return klass(**props)
Example usage:
b = clone_entity(a)
c = clone_entity(a, key_name='foo')
d = clone_entity(a, parent=a.key().parent())
EDIT: Changes if using NDB
Combining Gus' comment below with a fix for properties that specify a different datastore name, the following code works for NDB:
def clone_entity(e, **extra_args):
klass = e.__class__
props = dict((v._code_name, v.__get__(e, klass)) for v in klass._properties.itervalues() if type(v) is not ndb.ComputedProperty)
props.update(extra_args)
return klass(**props)
Example usage (note key_name becomes id in NDB):
b = clone_entity(a, id='new_id_here')
Side note: see the use of _code_name to get the Python-friendly property name. Without this, a property like name = ndb.StringProperty('n') would cause the model constructor to raise an AttributeError: type object 'foo' has no attribute 'n'.
If you're using the NDB you can simply copy with:
new_entity.populate(**old_entity.to_dict())
This is just an extension to Nick Johnson's excellent code to address the problems highlighted by Amir in the comments:
The db.Key value of the ReferenceProperty is no longer retrieved via an unnecessary roundtrip to the datastore.
You can now specify whether you want to skip DateTime properties with the auto_now and/or auto_now_add flag.
Here's the updated code:
def clone_entity(e, skip_auto_now=False, skip_auto_now_add=False, **extra_args):
"""Clones an entity, adding or overriding constructor attributes.
The cloned entity will have exactly the same property values as the original
entity, except where overridden. By default it will have no parent entity or
key name, unless supplied.
Args:
e: The entity to clone
skip_auto_now: If True then all DateTimeProperty propertes will be skipped which have the 'auto_now' flag set to True
skip_auto_now_add: If True then all DateTimeProperty propertes will be skipped which have the 'auto_now_add' flag set to True
extra_args: Keyword arguments to override from the cloned entity and pass
to the constructor.
Returns:
A cloned, possibly modified, copy of entity e.
"""
klass = e.__class__
props = {}
for k, v in klass.properties().iteritems():
if not (type(v) == db.DateTimeProperty and ((skip_auto_now and getattr(v, 'auto_now')) or (skip_auto_now_add and getattr(v, 'auto_now_add')))):
if type(v) == db.ReferenceProperty:
value = getattr(klass, k).get_value_for_datastore(e)
else:
value = v.__get__(e, klass)
props[k] = value
props.update(extra_args)
return klass(**props)
The first if expression is not very elegant so I appreciate if you can share a better way to write it.
I'm neither Python nor AppEngine guru, but couldn't one dynamically get/set the properties?
props = {}
for p in Thing.properties():
props[p] = getattr(old_thing, p)
new_thing = Thing(**props).put()
A variation inspired in Nick's answer which handles the case in which your entity has a (repeated) StructuredProperty, where the StructuredProperty itself has ComputedProperties. It can probably be written more tersely with dict comprehension somehow, but here is the longer version that worked for me:
def removeComputedProps(klass,oldDicc):
dicc = {}
for key,propertType in klass._properties.iteritems():
if type(propertType) is ndb.StructuredProperty:
purged = []
for item in oldDicc[key]:
purged.append(removeComputedProps(propertType._modelclass,item))
dicc[key]=purged
else:
if type(propertType) is not ndb.ComputedProperty:
dicc[key] = oldDicc[key]
return dicc
def cloneEntity(entity):
oldDicc = entity.to_dict()
klass = entity.__class__
dicc = removeComputedProps(klass,oldDicc)
return klass(**dicc)
This can be tricky if you've renamed the underlying keys for your properties... which some people opt to do instead of making mass data changes
say you started with this:
class Person(ndb.Model):
fname = ndb.StringProperty()
lname = ndb.StringProperty()
then one day you really decided that it would be nicer to use first_name and last_name instead... so you do this:
class Person(ndb.Model):
first_name = ndb.StringProperty(name="fname")
last_name = ndb.StringProperty(name="lname")
now when you do Person._properties (or .properties() or person_instance._properties) you will get a dictionary with keys that match the underlying names (fname and lname)... but won't match the actual property names on the class... so it won't work if you put them into the constructor of a new instance, or use the .populate() method (the above examples will break)
In NDB anyways, instances of models have ._values dictionary which is keyed by the underlying property names... and you can update it directly. I ended up with something like this:
def clone(entity, **extra_args):
klass = entity.__class__
clone = klass(**extra_args)
original_values = dict((k,v) for k,v in entity._values.iteritems() if k not in clone._values)
clone._values.update(original_values)
return clone
This isn't really the safest way... as there are other private helper methods that do more work (like validation and conversion of computed properties by using _store_value() and _retrieve_value())... but if you're models are simple enough, and you like living on the edge :)
Here's the code provided by #zengabor with the if expression formatted for easier reading. It may not be PEP-8 compliant:
klass = e.__class__
props = {}
for k, v in klass.properties().iteritems():
if not (type(v) == db.DateTimeProperty and ((
skip_auto_now and getattr(v, 'auto_now' )) or (
skip_auto_now_add and getattr(v, 'auto_now_add')))):
if type(v) == db.ReferenceProperty:
value = getattr(klass, k).get_value_for_datastore(e)
else:
value = v.__get__(e, klass)
props[k] = value
props.update(extra_args)
return klass(**props)

Categories