New Python object instance already containing data - python

ANSWERED! :)
I have to create a function that
Initializes a new object
Creates and adds data to that object
Returns the object containing the data.
The object basically contains a json dict (geojson) and offers a way of oop with geojson data.
In the python script in question, I create multiple instances of the class Feature.
However, for some reason, the second Feature object I create already has the attributes of the first Feature object - at a different RAM address AND without passing anything to the 2nd object (obj_1 and obj_2 are created in different functions, completely unrelated to each other...)
I am relatively new to OOP with Python so maybe I am missing something obvious.
This is also my first question on stackoverflow so just keep that in mind :)
I just can't wrap my head around this problem though.
This is the code I use (I will only give you the Feature init, if you need more I'll happily provide more!):
class Feature:
def __init__(self, featuretype: str="Point", coordinates: list=[], \
properties: dict={}, dictionary: dict={}):
# basically loads a provided dictionary or auto-creates one in json format
if dictionary:
self.dict = dictionary
else:
self.new(featuretype, coordinates, properties)
self.update()
First function (outside class):
def create_checkerboard(extent, dist_x, dist_y) -> bk.Feature:
checker_ft = bk.Feature(featuretype="MultiPoint") #init
checker_ft.gen_grid_adv(extent, dist_x, dist_y, matrixname="checkerboard") #populate
return checker_ft
The returned checker_ft object now has a checker_ft.dict which is now populated with a MultiPoint grid.
All is well.
Second function (outside class):
def random_grid(extent, grid_spacing_x, grid_spacing_y, shift) -> bk.Feature:
shift_ft = bk.Feature(featuretype="MultiPoint") #init
shift_ft.gen_grid(extent, grid_spacing_x, grid_spacing_y) #populate
return shift_ft
Now, for a reason which is obviously beyond me, the shift_ft.dict contains data from the checker_ft.
And both objects are at different RAM locations:
<bk_functions.Feature object at 0x000001A726A3CF70>
<bk_functions.Feature object at 0x000001A726A1F190>
I hope that this is a simple oversight on my part.
Thank you for your kind attention!

Related

Python: can an object have an object as a "default" representation?

I am just getting started with OOP, so I apologise in advance if my question is as obvious as 2+2. :)
Basically I created a class that adds attributes and methods to a panda data frame. That's because I am sometimes looking to do complex but repetitive tasks like merging with a bunch of other tables, dropping duplicates, etc. So it's pleasant to be able to do that in just one go with a predefined method. For example, I can create an object like this:
mysupertable = MySupperTable(original_dataframe)
And then do:
mysupertable.complex_operation()
Where original_dataframe is the original panda data frame (or object) that is defined as an attribute to the class. Now, this is all good and well, but if I want to print (or just access) that original data frame I have to do something like
print(mysupertable.original_dataframe)
Is there a way to have that happening "by default" so if I just do:
print(mysupertable)
it will print the original data frame, rather than the memory location?
I know there are the str and rep methods that can be implemented in a class which return default string representations for an object. I was just wondering if there was a similar magic method (or else) to just default showing a particular attribute. I tried looking this up but I think I am somehow not finding the right words to describe what I want to do, because I can't seem to be able to find an answer.
Thank you!
Cheers
In your MySupperTable class, do:
class MySupperTable:
# ... other stuff in the class
def __str__(self) -> str:
return str(self.original_dataframe)
That will make it so that when a MySupperTable is converted to a str, it will convert its original_dataframe to a str and return that.
When you pass an object to print() it will print the object's string representation, which under the hood is retrieved by calling the object.__str__(). You can give a custom definition to this method the way that you would define any other method.

Question about the performance of a home-brewed mutable string object in Python

I'm creating a simple unix console text scroller and my approach is to continually append a deque with its max len set to the screen width. Everytime I clear the console and append a new character to the deque, I am joining (''.join(deque)) the contents of the deque into a string to print to the console. I know that everytime I do this I am creating a new object. The text files I am feeding into the scroller are huge and I'm wondering if the constant creation of string objects in this manner isn't grossly inefficient when it comes to memory. I thought about using StringIO, but everytime I call getvalue() it also creates a new object. It seems that anyway I approached it, printing to the console would create a new object with a unique id. So I created a simple MutableString class. Using the str() method, I can print to the console and the class object appears to only have two ids associated with it. One for the class object itself and one for the object created when str() is called by print(). My question is: is this approach actually more efficient than calling ''.join(deque) as I was doing? Is the simple class I created actually mutable and therefore saving resources? If not, what might be a computationally inexpensive way to add characters to a string? Thanks for your time and attention!
from collections import deque
class MutableString:
def __init__(self, string):
self.string = deque(string)
def __repr__(self):
return f'mutable_string.MutableString({self.__str__()})'
def __str__(self):
return ''.join(self.string)
def __add__(self, other):
self.string.append(other)

Python - how to create graph of variable assignment?

In a sample python class function, I have one or more class items that have arbitrary type and constructor signatures that all have a single return value and one or more original parameters to the function. Additionally, I have the possibility of using the output of a given member object as the input to another member object:
class Blah(...):
def __init__(
def myfunc(param1, param2... param_n):
r1 = self.obj1(param1,...)
...
r_n = self.obj_n(param1,r1,...)
What I need to know is, is there a way to instrument python to track edges between input and output of each invocation of a given set of tracked objects?
For example, as in the above, the result would be a graph: (param1...) -> r1, and (param1,r1...) -> r_n
The actual edge direction doesn't matter so long as the input-output relationship is consitent.
You could trace the function, and create a mapping of every function call.
An example of this is pytorch's onnx export capability, which uses this technique. In addition, if that's not enough, you could probably resort to using the python debugger api or just instrument all items within a module by using the inspect module.
import inspect
inspect.getmembers(your_module, isfunction)
By creating a class and defining call with the kwargs convention, you can match the signature of any object or function that you wrap with it. Then, when you iterate on the members of a module, you can wrap and re-assign that member with some class instance that reads the function meta-data or dynamic type information (f.name or otherwise), you can then track the arguments (maintain names by some unique id generation scheme) and function names and just create a graph right out of them.

Use python dict to lookup mutable objects

I have a bunch of File objects, and a bunch of Folder objects. Each folder has a list of files. Now, sometimes I'd like to lookup which folder a certain file is in. I don't want to traverse over all folders and files, so I create a lookup dict file -> folder.
folder = Folder()
myfile = File()
folder_lookup = {}
# This is pseudocode, I don't actually reach into the Folder
# object, but have an appropriate method
folder.files.append(myfile)
folder_lookup[myfile] = folder
Now, the problem is, the files are mutable objects. My application is built around the fact. I change properites on them, and the GUI is notified and updated accordingly. Of course you can't put mutable objects in dicts. So what I tried first is to generate a hash based on the current content, basically:
def __hash__(self):
return hash((self.title, ...))
This didn't work of course, because when the object's contents changed its hash (and thus its identity) changed, and everything got messed up. What I need is an object that keeps its identity, although its contents change. I tried various things, like making __hash__ return id(self), overriding __eq__, and so on, but never found a satisfying solution. One complication is that the whole construction should be pickelable, so that means I'd have to store id on creation, since it could change when pickling, I guess.
So I basically want to use the identity of an object (not its state) to quickly look up data related to the object. I've actually found a really nice pythonic workaround for my problem, which I might post shortly, but I'd like to see if someone else comes up with a solution.
I felt dirty writing this. Just put folder as an attribute on the file.
class dodgy(list):
def __init__(self, title):
self.title = title
super(list, self).__init__()
self.store = type("store", (object,), {"blanket" : self})
def __hash__(self):
return hash(self.store)
innocent_d = {}
dodge_1 = dodgy("dodge_1")
dodge_2 = dodgy("dodge_2")
innocent_d[dodge_1] = dodge_1.title
innocent_d[dodge_2] = dodge_2.title
print innocent_d[dodge_1]
dodge_1.extend(range(5))
dodge_1.title = "oh no"
print innocent_d[dodge_1]
OK, everybody noticed the extremely obvious workaround (that took my some days to come up with), just put an attribute on File that tells you which folder it is in. (Don't worry, that is also what I did.)
But, it turns out that I was working under wrong assumptions. You are not supposed to use mutable objects as keys, but that doesn't mean you can't (diabolic laughter)! The default implementation of __hash__ returns a unique value, probably derived from the object's address, that remains constant in time. And the default __eq__ follows the same notion of object identity.
So you can put mutable objects in a dict, and they work as expected (if you expect equality based on instance, not on value).
See also: I'm able to use a mutable object as a dictionary key in python. Is this not disallowed?
I was having problems because I was pickling/unpickling the objects, which of course changed the hashes. One could generate a unique ID in the constructor, and use that for equality and deriving a hash to overcome this.
(For the curious, as to why such a "lookup based on instance identity" dict might be neccessary: I've been experimenting with a kind of "object database". You have pure python objects, put them in lists/containers, and can define indexes on attributes for faster lookup, complex queries and so on. For foreign keys (1:n relationships) I can just use containers, but for the backlink I have to come up with something clever if I don't want to modify the objects on the n side.)

How to create a class from function

I am still struggling with understanding classes, I am not certain but I have an idea that this function I have created is probably a good candidate for a class. The function takes a list of dictionaries, identifies the keys and writes out a csv file.
First Q, is this function a good candidate for a class (I write out a lot of csv files
Second Q If the answer to 1 is yes, how do I do it
Third Q how do I use the instances of the class (did I say that right)
import csv
def writeCSV(dictList,outfile):
maxLine=dictList[0]
for item in dictList:
if len(item)>len(maxLine):
maxLine=item
dictList.insert(0,dict( (key,key) for key in maxLine.keys()))
csv_file=open(outfile,'ab')
writer = csv.DictWriter(csv_file,fieldnames=[key for key in maxLine.keys()],restval='notScanned',dialect='excel')
for dataLine in dictList:
writer.writerow(dataLine)
csv_file.close()
return
The main idea behind objects is that an object is data plus methods.
Whenever you are thinking about making something an object, you must ask yourself what will be the object's data, and what operations (methods) will you want to perform on that data.
Functions, more readily translate to methods than classes.
So, for instance, if your dictList is data upon which you often call writeCSV,
then perhaps make a dictList object with method writeCSV:
class DictList(object):
def __init__(self,data):
self.data=data
def writeCSV(self,outfile):
maxLine=self.data[0]
for item in self.data:
if len(item)>len(maxLine):
maxLine=item
self.data.insert(0,dict( (key,key) for key in maxLine.keys()))
csv_file=open(outfile,'ab')
writer = csv.DictWriter(
csv_file,fieldnames=[key for key in maxLine.keys()],
restval='notScanned',dialect='excel')
for dataLine in self.data:
writer.writerow(dataLine)
csv_file.close()
Then you could instantiate a DictList object:
dl=DictList([{},{},...])
dl.writeCSV(outfile)
Doing this might make sense if you have more methods that could operate on the same DictList.data. Otherwise, you'd probably be better off sticking with the original function.
For this you need to understand little bit concepts of classes first and then follow the next step.
I too faced a same problem and followed this LINK , I m sure u will also start working on classes from your structured programming.
If you want to write a lot of CSV files with the same dictList (is that what you're saying...?), turning the function into a class would let you perform initialization just once, and then write repeatedly from the same initialized instance. E.g., with other minor opts:
class CsvWriter(object):
def __init__(self, dictList):
self.maxline = max(dictList, key=len)
self.dictList = [dict((k,k) for k in self.maxline)]
self.dictList.extend(dictList)
def doWrite(self, outfile):
csv_file=open(outfile,'ab')
writer = csv.DictWriter(csv_file,
fieldnames=self.maxLine.keys(),
restval='notScanned',
dialect='excel')
for dataLine in self.dictList:
writer.writerow(dataLine)
csv_file.close()
This seems a dubious use case, but if it does match your desire, then you'd instantiate and use this class as follows...:
cw = CsvWriter(dataList)
for ou in many_outfiles:
cw.doWrite(ou)
When thinking about making objects, remember this:
Classes have attributes - things that describe different instances of the class differently
Classes have methods - things that the objects do (often involving using their attributes)
Objects and classes are wonderful, but the first thing to keep in mind is that they are not always necessary, or even desirable.
That said, in answer to your first question, this doesn't seem like a particularly good candidate for a class. The only thing different between the different CVS files you're writing are the data and the file you write to, and the only thing you do with them (ie, the only method you would have) is the function you've already written).
Even though the first answer is no, it's still instructive to see how a class is built.
class CSVWriter:
# this function is called when you create an instance of the class
# it sets up the initial attributes of the instance
def __init__(self, dictList, outFile):
self.dictList = dictList
self.outFile = outFile
def writeCSV(self):
# basically exactly what you have above, except you can use the instance's
# own variables (ie, self.dictList and self.outFile) instead of the local
# variables
For your final question - the first step to using an instance of a class (an individual object, if you will) is to create that instance:
myCSV = CSVWriter(dictList, outFile)
When the object is created, init is called with the arguments you gave it - that allows your object to have its own data. Now you can access any of the attributes or methods that your myCSV object has with the '.' operator:
myCSV.writeCSV()
print "Wrote a file to", myCSV.outFile
One way to think about objects versus functions is that objects are generally nouns (eg, I created a CSVWriter), while functions are verbs (eg, you wrote a the function that writes CSV files). If you're just doing something over and over again, without re-using any of the same data, a function by itself is fine. But, if you have lots of related data, and part of it gets changed in the course of the action, classes may be a good idea.
I don't think your writeCSV is in need of a class, typicaly class would be used when you have to update some state(data) and then act on it, may be with various options.
e.g. if you need to pass around your object, so that other function/method can add values to it or your final action/output function has many options or you think same data can be processed, acted upon in many ways.
Typically practical case would be if you have multiple functions which act on same data or a singe function whose optional parameter list is going to long, you may think of converting it into a class.
If in your case you had various options and need to insert data in increments, you should make it a class.
Usually class name would be noun, so function(verb) writeCSV -> class(noun) CSVWriter
class CSVWriter(object):
def __init__(self, init-params...):
self.data = {}
def addData(self, data):
self.data.update(data)
def dumpCSV(self, filePath):
...
def dumpJSON(self, filePath):
....
I think question 1 is pretty crucial as it goes to the heart of what a class is.
Yes, you can put this function in a class. A class is a set of functions (called methods) and data together in one logical unit. As other posters noted, probably overkill to have a class with one method.

Categories