How do I update GeoJSON/Leaflet in Python without re-rendering - python

I am using dash-leaflet and adding things to the map using GeoJSON in python. I am adding a lot of objects (100-ish) objects often. The objects are almost entirely the same list, with only a few things added or removed each run. As of now, I only know how to send the entire list of features at once. How do I add / remove specific objects while keeping everything else.

Related

Why is document.py:__setattr__() getting called more times after switching regular object with MongoEngine document

I'm new to Python and have been tasked with optimizing some code and am trying to understand why my change has slowed things down. The code I'm working with is in a backend flask app.
The changes I made involved removing the use of temporary object that was being used to store data before copying all fields to a MongoEngine document object. All fields would get assigned to this temporary object, and then there was a conversion function that cast all fields to their proper data types for storage. Instead of using this temporary object, I just instantiated the MongoEngine document and replaced all lines that were assigning to the temporary object to instead assign to the document. I didn't add any lines, just replaced existing ones.
When I checked the changes using the Werkzeug Application Profiler for flask. It's showing 336,897 calls to __setattr__() before the changes and 502,953 calls after the changes.
I'm just wondering if there's any explanation for this other than me inadvertently increasing the calls somehow (I don't think this is the case because I've reviewed the changes in using git diff a few times and I didn't notice anything).
I appreciate any help I can get. Sorry for not providing any code examples (don't want to expose the companies code). However, if needed I can try my best to write some example code to show what I did.
Before:
__setattr()__ calls before changes
After:
__setattr()__ calls after changes

Python: how to garbage collect strings

I'm having a problem in a large-runtime script. This script is a multithreaded environment, to perform crawling tasks.
In large executions, script's memory consumption become huge, and after profiling memory with guppy hpy, I saw that most of the problem is coming by strings.
I'm not storing so many strings: just get content of htmls into memory, to store them in db. After it, string is not used anymore (the variable containing it is assigned to the next string).
The problem arised because I saw that every new string (with sys.getrefcount) have, at least, 2 references (1 from my var, and 1 internal). It seems that reassigning another value to my var does not remove the internal reference, so the string stills in memory.
What can I do to be sure that strings are garbage collected?
Thank you in advance
EDIT:
1- I'm using Django ORM
2- I'm obtaining all of that strings from 2 sources:
2.1- Directly from socket (urllib2.urlopen(url).read())
2.2- Parsing that responses, and extrating new URIs from every html, and feeding system
SOLVED
Finally, I got the key. The script is part of Django environment, and seems that Django's underground is doing some cache or something similar. I turned off debugging, and all started to work as expected (reused indentifiers seems to delete references to old objects, and that objects become collected by gc).
For anyone who uses some kind of framework layer over python, be aware of configuration: seems that some debug configurations with intensive process can lead to memory leaks
You say:
I saw that every new string (with sys.getrefcount) have, at least, 2 references
But did you carefully read the description of getrefcount() ? :
sys.getrefcount()
object) Return the reference count of the object. The count returned
is generally one higher than you might expect, because it includes the
(temporary) reference as an argument to getrefcount().
.
You should explain more about your prohgram.
What is the size of the HTML strings it holds ?
How are they obtained ? Are you sure to close all file's handler , all socket connexions, ....?
You'd need to find out who keeps the "internal" reference to your strings.
Perhaps the library you're using to write to DB (you didn't specify how you write to DB).
I find objgraph very useful for tasks like this: https://pypi.python.org/pypi/objgraph
E.g.
import objgraph
objgraph.show_backrefs([mystring], filename='a.png')

map data type for python google app engine

I would like to have a map data type for one of my entity types in my python google app engine application. I think what I need is essentially the python dict datatype where I can create a list of key-value mappings. I don't see any obvious way to do this with the provided datatypes in app engine.
The reason I'd like to do this is that I have a User entity and I'd like to track within that user a mapping of lessonIds to values that represent that user's status with a particular lesson id. I'd like to do this without creating a whole new entity that might be titled UserLessonStatus and have it reference the User and have to be queried, since I often want to iterate through all the lesson statuses. Maybe it is better done this way, in which case, I'd appreciate opinions that this is how it's best done. Otherwise if someone knows a good way to create a mapping within my User entity itself, that'd be great.
One solution I considered is using two ListProperties in conjunction, i.e. when adding an object append the key and value to each list; when locating, find the index of the string in one list and reference using that index in the other; when removing, find the index in one, use it to remove from each, and so forth.
You're probably better off using another kind, as you suggest. If you do want to store it all in the one entity, though, you have several options - parallel lists, as you suggest, are one option. You could also simply pickle a Python dictionary, assuming you don't want to query on it.
You may want to check out the ndb project, which supports nested entities, which would also be a viable solution.

Examples of use for PickledObjectField (django-picklefield)?

surfing on the web, reading about django dev best practices points to use pickled model fields with extreme caution.
But in a real life example, where would you use a PickledObjectField, to solve what specific problems?
We have a system of social-networks "backends" which do some generic stuff like "post message", "get status", "get friends" etc. The link between each backend class and user is django model, which keeps user, backend name and credentials. Now imagine how many auth systems are there: oauth, plain passwords, facebook's obscure js stuff etc. This is where JSONField shines, we keep all backend-specif auth data in a dictionary on this model, which is stored in db as json, we can put anything into it no problem.
You would use it to store... almost-arbitrary Python objects. In general there's little reason to use it; JSON is safer and more portable.
You can definitely substitute a PickledObjectField with JSON and some extra logic to create an object out of the JSON. At the end of the day, your use case, when considering to use a PickledObjectField or JSON+logic, is serializing a Python object into your database. If you can trust the data in the Python object, and know that it will always be serialize-able, you can reasonably use the PickledObjectField. In my case (I don't use django's ORM, but this should still apply), I have a couple different object types that can go into my PickledObjectField, and their definitions are constantly mutating. Rather than constantly updating my JSON parsing logic to create an object out of JSON values, I simply use a PickledObjectField to just store the different objects, and then later retrieve them in perfectly usable form (calling their functions). Caveat: If you store an object via PickledObjectField, then you change the object definition, and then you retrieve the object, the old object may have trouble fitting into the new object's definition (depending on what you changed).
The problems to be solved are the efficiency and the convenience of defining and handling a complex object consisting of many parts.
You can turn each part type into a Model and connect them via ForeignKeys.
Or you can turn each part type into a class, dictionary, list, tuple, enum or whathaveyou to your liking and use PickledObjectField to store and retrieve the whole beast in one step.
That approach makes sense if you will never manipulate parts individually, only the complex object as a whole.
Real life example
In my application there are RQdef objects that represent essentially a type with a certain basic structure (if you are curious what they mean, look here).
RQdefs consist of several Aspects and some fixed attributes.
Aspects consist of one or more Facets and some fixed attributes.
Facets consist of two or more Levels and some fixed attributes.
Levels consist of a few fixed attributes.
Overall, a typical RQdef will have about 20-40 parts.
An RQdef is always completely constructed in a single step before it is stored in the database and it is henceforth never modified, only read (but read frequently).
PickledObjectField is more convenient and much more efficient for this purpose than would be a set of four models and 20-40 objects for each RQdef.

Are there memory efficiencies gained when code is wrapped in functions?

I have been working on some code. My usual approach is to first solve all of the pieces of the problem, creating the loops and other pieces of code I need as I work through the problem and then if I expect to reuse the code I go back through it and group the parts of code together that I think should be grouped to create functions.
I have just noticed that creating functions and calling them seems to be much more efficient than writing lines of code and deleting containers as I am finished with them.
for example:
def someFunction(aList):
do things to aList
that create a dictionary
return aDict
seems to release more memory at the end than
>>do things to alist
>>that create a dictionary
>>del(aList)
Is this expected behavior?
EDIT added example code
When this function finishes running the PF Usage shows an increase of about 100 mb the filingsList has about 8 million lines.
def getAllCIKS(filingList):
cikDICT=defaultdict(int)
for filing in filingList:
if filing.startswith('.'):
del(filing)
continue
cik=filing.split('^')[0].strip()
cikDICT[cik]+=1
del(filing)
ciklist=cikDICT.keys()
ciklist.sort()
return ciklist
allCIKS=getAllCIKS(open(r'c:\filinglist.txt').readlines())
If I run this instead I show an increase of almost 400 mb
cikDICT=defaultdict(int)
for filing in open(r'c:\filinglist.txt').readlines():
if filing.startswith('.'):
del(filing)
continue
cik=filing.split('^')[0].strip()
cikDICT[cik]+=1
del(filing)
ciklist=cikDICT.keys()
ciklist.sort()
del(cikDICT)
EDIT
I have been playing around with this some more today. My observation and question should be refined a bit since my focus has been on the PF Usage. Unfortunately I can only poke at this between my other tasks. However I am starting to wonder about references versus copies. If I create a dictionary from a list does the dictionary container hold a copy of the values that came from the list or do they hold references to the values in the list? My bet is that the values are copied instead of referenced.
Another thing I noticed is that items in the GC list were items from containers that were deleted. Does that make sense? Soo I have a list and suppose each of the items in the list was [(aTuple),anInteger,[another list]]. When I started learning about how to manipulate the gc objects and inspect them I found those objects in the gc even though the list had been forcefully deleted and even though I passed the 0,1 & 2 value to the method that I don't remember to try to still delete them.
I appreciate the insights people have been sharing. Unfortunately I am always interested in figuring out how things work under the hood.
Maybe you used some local variables in your function, which are implicitly released by reference counting at the end of the function, while they are not released at the end of your code segment?
You can use the Python garbage collector interface provided to more closely examine what (if anything) is being left around in the second case. Specifically, you may want to check out gc.get_objects() to see what is left uncollected, or gc.garbage to see if you have any reference cycles.
Some extra memory is freed when you return from a function, but that's exactly as much extra memory as was allocated to call the function in the first place. In any case - if you seeing a large amount of difference, that's likely an artifact of the state of the runtime, and is not something you should really be worrying about. If you are running low on memory, the way to solve the problem is to keep more data on disk using things like b-trees (or just use a database), or use algorithms that use less memory. Also, keep an eye out for making unnecessary copies of large data structures.
The real memory savings in creating functions is in your short-term memory. By moving something into a function, you reduce the amount of detail you need to remember by encapsulating part of the minutia away.
Maybe you should re-engineer your code to get rid of unnecessary variables (that may not be freed instantly)... how about the following snippet?
myfile = file(r"c:\fillinglist.txt")
ciklist = sorted(set(x.split("^")[0].strip() for x in myfile if not x.startswith(".")))
EDIT: I don't know why this answer was voted negative... Maybe because it's short? Or maybe because the dude who voted was unable to understand how this one-liner does the same that the code in the question without creating unnecessary temporal containers?
Sigh...
I asked another question about copying lists and the answers, particularly the answer directing me to look at deepcopy caused me to think about some dictionary behavior. The problem I was experiencing had to do with the fact that the original list is never garbage collected because the dictionary maintains references to the list. I need to use the information about weakref in the Python Docs.
objects referenced by dictionaries seem to stay alive. I think (but am not sure) the process of pushing the dictionary out of the function forces the copy process and kills the object. This is not complete I need to do some more research.

Categories