I want to read a text file, manipulate the fields a bit, and load them into instance variables for an object. Each row of the text would be stored in one object, so reading the whole file should return a list of objects.
Here's an example of the file:
L26 [coords]704:271[/coords] (1500)
L23 [coords]681:241[/coords] (400)
L20 [coords]709:229[/coords] (100)
And here's part of the current class definition:
class Poi(object):
'''Points of Interest have a location, level and points'''
def __init__(self, level, coords, points):
self.level = level
self.coordinates = coords
self.points = points
I'm new to this, and probably overthinking it by a lot, but it seems like the method to read and write the list of Pois should be part of the Poi class. Is there a correct way to do that, or is the right answer to have a separate function like this one?
def load_poi_txt(source_file, source_dir):
poi_list = []
pass
return poi_list
Both are correct, depending on what you want. Here's the method skeleton:
class Poi(object):
...
#classmethod
def load_from_txt(cls, source_file, source_dir):
res = []
while (still more to find):
# find level, coords, and points
res.append(cls(level, coords, points))
return res
Note how it uses cls, which is the class the method is defined on. In this case it is Poi, but it could just as easily be a subclass of Poi defined later without needing to change the method itself.
Related
Begginer level question, I am quite new and wanted to learn using unit test in my code. I've watched some tutorials how to make unittests, but when comes to practice it on your own code I start wonder how to make it properly to avoid learn bad code habits.
I have a class Geometry that will be inherited by other class, that class use imported custom object (namedtuple "Point") and list from json file with configuration but this will be provided by other part of code. Here are my questions:
Does my unittests should only check class methods create_geometry and calculate_drag_surfaces or also all mentioned above and instance creation with init method ?
When I create unitest of method that affect instance property, like create_geometry method do, how asserts should look like ? I should check value of changed instance property or there is a way to test it "in-place" without new instance creation ?
How should I make unittest for protected or hidden methods, I mean is there any difference there ?
If you will find any issues in my code I'm open to hear any suggestions I don't have any commercial experience and want to learn as much as I can. Below I presented code I want to test with unittest.
from point import Point
class Geometry:
"""
Geometry class - object that will keep geometry dependent information necessary for
graphic render and physic calculations
"""
def __init__(self, geometry_points_cords, *args, **kwargs):
super().__init__(*args, **kwargs)
self.geometry_points = [] # list of named tuples with co-ords on flat surface
self.__create_geometry(geometry_points_cords)
self.x_drag_surface = None
self.y_drag_surface = None
self.__calculate_drag_surfaces()
def __create_geometry(self, geometry_points_cords):
"""
Method that will convert provided geometry points into namedtuples that describe
geometry on x/y plane
:param geometry_points_cords:
:return:
"""
for geometry_cords in geometry_points_cords:
self.geometry_points.append(Point(geometry_cords[0], geometry_cords[1]))
def __calculate_drag_surfaces(self):
"""
Method that will calculate drag surfaces in each axis base on geometry
:return:
"""
x_cords = []
y_cords = []
for single_point in self.geometry_points:
x_cords.append(single_point.x)
y_cords.append(single_point.y)
self.x_drag_surface = (max(x_cords) - min(x_cords))**2
self.y_drag_surface = (max(y_cords) - min(y_cords))**2
Is the interface the two fields x_drag_surface and y_drag_surface? Then you should primarily test that those get the proper values.
geometry = Geometry(some_coordinates)
assert geometry.x_drag_surface = correct_x_drag_surface
assert geometry.y_drag_surface = correct_y_drag_surface
As the code is written now you can not test __create_geometry and __calculate_drag_surfaces separately since they will both be run by the constructor. You can extract them from the class, though, and make them testable:
def make_points(coordinates):
"""
Method that will convert provided geometry points into namedtuples that describr geometry on x/y plane
:param geometry_points_cords:
:return:
"""
return [ Point(x, y) for (x, y) in coordinates]
def calculate_drag_surfaces(points):
"""
Method that will calculate drag surfaces in each axis base on geometry
:return:
"""
x_coords = list(map(lambda p: p.x, points))
y_coords = list(map(lambda p: p.y, points))
x_drag_surface = (max(x_coords) - min(x_coords))**2
y_drag_surface = (max(y_coords) - min(y_coords))**2
return x_drag_surface, y_drag_surface
class Geometry:
"""
Geometry class - object that will keep geometry dependent information necessary for
graphic render and physic calculations
"""
def __init__(self, coordinates, *args, **kwargs):
super().__init__(*args, **kwargs)
self.geometry_points = make_points(coordinates) # list of named tuples with co-ords on flat surface
self.x_drag_surface, self.y_drag_surface = calculate_drag_surfaces(self.geometry_points)
In my application, I need a fast look up of attributes. Attributes are in this case a composition of a string and a list of dictionaries. These attributes are stored in a wrapper class. Let's call this wrapper class Plane:
class Plane(object):
def __init__(self, name, properties):
self.name = name
self.properties = properties
#classmethod
def from_idx(cls, idx):
if idx == 0:
return cls("PaperPlane", [{"canFly": True}, {"isWaterProof": False}])
if idx == 1:
return cls("AirbusA380", [{"canFly": True}, {"isWaterProof": True}, {"hasPassengers": True}])
To better play with this class, I added a simple classmethod to construct instances by providing and integer.
So now in my application I have many Planes, of the order of 10,000,000. Each of these planes can be accessed by a universal unique id (uuid). What I need is a fast lookup: given an uuid, what is the Plane. The natural solution is a dict. A simple class to generate planes with uuids in a dict and to store this dict in a file may look like this:
class PlaneLookup(object):
def __init__(self):
self.plane_dict = {}
def generate(self, n_planes):
for i in range(n_planes):
plane_id = uuid.uuid4().hex
self.plane_dict[plane_id] = Plane.from_idx(np.random.randint(0, 2))
def save(self, filename):
with gzip.open(filename, 'wb') as f:
pickle.dump(self.plane_dict, f, pickle.HIGHEST_PROTOCOL)
#classmethod
def from_disk(cls, filename):
pl = cls()
with gzip.open(filename, 'rb') as f:
pl.plane_dict = pickle.load(f)
return pl
So now what happens is that if I generate some planes?
pl = PlaneLookup()
pl.generate(1000000)
What happens is, that lots of memory gets consumed! If I check the size of my pl object with the getsize() method from this question, I get on my 64bit machine a value of 1,087,286,831 bytes. Looking at htop, my memory demand seems to be even higher (around 2GB).
In this question, it is explained quite well, why python dictionaries need much memory.
However, I think this does not have to be the case in my application. The plane object that is created in the PlaneLookup.generate() method contains very often the same attributes (i.e. the same name and the same properties). So it has to be possible, to save this object once in the dict and whenever the same object (same name, same attribute) is created again, only a reference to the already existing dict entry is stored. As a simple Plane object has a size of 1147 bytes (according to the getsize() method), just saving references may save a lot of memory!
The question is now: How do I do this? In the end I need a function that takes a uuid as an input and returns the corresponding Plane object as fast as possible with as little memory as possible.
Maybe lru_cache can help?
Here is again the full code to play with:
https://pastebin.com/iTZyQQAU
Did you think about having another dictionary with idx -> plane? then in self.plane_dict[plane_uuid] you would just store idx instead of object. this will save memory and speed up your app, though you'd need to modify the lookup method.
I have a class in python, and I'd like to be able to do an operation to one object and have that operation in turn identify another object to change and in turn modify it. To make this concrete:
Say we have a bunch of locations (the objects), and each location has some things (call them elements). I would like to tell location 1 to move one of its elements to the right by location.move_right(element). It should then add that element to location 2. (in my actual problem, the "location" would need to calculate which location it will move to, and so I don't know a priori where it's going.)
Here is the working code I have. Within the class I've placed a dictionary that will hold all objects that have been created. I can imagine this could cause trouble for garbage collection etc. I suspect there is a better way. Is there?
class location(object):
locations = {}
def __init__(self, coordinate):
self.locations[coordinate]=self
self.coord = coordinate
self.elements = set()
def add_element(self, element):
self.elements.add(element)
def move_right(self, element):
self.elements.remove(element)
new_location = self.locations[self.coord+1]
new_location.add_element(element)
x = location(1)
y=location(2)
x.add_element('r')
x.move_right('r')
I have a project in which I run multiple data through a specific function that "cleans" them.
The cleaning function looks like this:
Misc.py
def clean(my_data)
sys.stdout.write("Cleaning genes...\n")
synonyms = FileIO("raw_data/input_data", 3, header=False).openSynonyms()
clean_genes = {}
for g in data:
if g in synonyms:
# Found a data point which appears in the synonym list.
#print synonyms[g]
for synonym in synonyms[g]:
if synonym in data:
del data[synonym]
clean_data[g] = synonym
sys.stdout.write("\t%s is also known as %s\n" % (g, clean_data[g]))
return data
FileIO is a custom class I made to open files.
My question is, this function will be called many times throughout the program's life cycle. What I want to achieve is don't have to read the input_data every time since it's gonna be the same every time. I know that I can just return it, and pass it as an argument in this way:
def clean(my_data, synonyms = None)
if synonyms == None:
...
else
...
But is there another, better looking way of doing this?
My file structure is the following:
lib
Misc.py
FileIO.py
__init__.py
...
raw_data
runme.py
From runme.py, I do this from lib import * and call all the functions I made.
Is there a pythonic way to go around this? Like a 'memory' for the function
Edit:
this line: synonyms = FileIO("raw_data/input_data", 3, header=False).openSynonyms() returns a collections.OrderedDict() from input_data and using the 3rd column as the key of the dictionary.
The dictionary for the following dataset:
column1 column2 key data
... ... A B|E|Z
... ... B F|W
... ... C G|P
...
Will look like this:
OrderedDict([('A',['B','E','Z']), ('B',['F','W']), ('C',['G','P'])])
This tells my script that A is also known as B,E,Z. B as F,W. etc...
So these are the synonyms. Since, The synonyms list will never change throughout the life of the code. I want to just read it once, and re-use it.
Use a class with a __call__ operator. You can call objects of this class and store data between calls in the object. Some data probably can best be saved by the constructor. What you've made this way is known as a 'functor' or 'callable object'.
Example:
class Incrementer:
def __init__ (self, increment):
self.increment = increment
def __call__ (self, number):
return self.increment + number
incrementerBy1 = Incrementer (1)
incrementerBy2 = Incrementer (2)
print (incrementerBy1 (3))
print (incrementerBy2 (3))
Output:
4
5
[EDIT]
Note that you can combine the answer of #Tagc with my answer to create exactly what you're looking for: a 'function' with built-in memory.
Name your class Clean rather than DataCleaner and the name the instance clean. Name the method __call__ rather than clean.
Like a 'memory' for the function
Half-way to rediscovering object-oriented programming.
Encapsulate the data cleaning logic in a class, such as DataCleaner. Make it so that instances read synonym data once when instantiated and then retain that information as part of their state. Have the class expose a clean method that operates on the data:
class FileIO(object):
def __init__(self, file_path, some_num, header):
pass
def openSynonyms(self):
return []
class DataCleaner(object):
def __init__(self, synonym_file):
self.synonyms = FileIO(synonym_file, 3, header=False).openSynonyms()
def clean(self, data):
for g in data:
if g in self.synonyms:
# ...
pass
if __name__ == '__main__':
dataCleaner = DataCleaner('raw_data/input_file')
dataCleaner.clean('some data here')
dataCleaner.clean('some more data here')
As a possible future optimisation, you can expand on this approach to use a factory method to create instances of DataCleaner which can cache instances based on the synonym file provided (so you don't need to do expensive recomputation every time for the same file).
I think the cleanest way to do this would be to decorate your "clean" (pun intended) function with another function that provides the synonyms local for the function. this is iamo cleaner and more concise than creating another custom class, yet still allows you to easily change the "input_data" file if you need to (factory function):
def defineSynonyms(datafile):
def wrap(func):
def wrapped(*args, **kwargs):
kwargs['synonyms'] = FileIO(datafile, 3, header=False).openSynonyms()
return func(*args, **kwargs)
return wrapped
return wrap
#defineSynonyms("raw_data/input_data")
def clean(my_data, synonyms={}):
# do stuff with synonyms and my_data...
pass
I'm quite new to Python and I need to make declare my own data structure, I'm a bit confused on how to do this though. I currently have:
class Particle:
def __init__(self, mass, position, velocity, force):
self.mass = mass
self.position, self.velocity, self.force = position, velocity, force
def __getitem__(self, mass):
return self.mass
def __getitem__(self, position):
return self.position
def __getitem__(self, velocity):
return self.velocity
def __getitem__(self, force):
return self.force
This isn't working, however, when I try to define an instance of the class with:
p1 = Particle(mass, position, velocity, force)
Every value just ends up as a (0.0, 0.0) (which is the value for velocity and force).
Could someone explain where I'm going wrong, all I need from the data structure is to be able to pull the data out of it, nothing else. (edit: actually, sorry, I will have to change them a bit later on)
Thanks
First off, you should understand that __getitem__ is syntactic sugar. It's nice to have, but if you don't need it, don't use it. __getitem__ and __setitem__ are basically if you want to be able to access items from your object using bracket notation like:
p= Particle(foo)
bar = p[0]
if you don't need to this, don't worry about it.
Now, onto everything else. It looks like you've got the main characteristics you want your object to carry around in your __init__ definition, which is fine. Now you need to actually bind those values onto your object using self:
class Particle:
def __init__(self, mass, position, velocity, force):
self.mass = mass
self.position = position
self.velocity = velocity
self.force = force
That's really it. You can now access these values using dot notation, like so:
mass,pos,vel,f = 0,0,0,0 # just for readability
p = Particle(mass,pos,vel,f)
print p.mass, p.position, p.velocity, p.force
One of the nice things we get out of this is that if we ask python what p is, it will tell you that it is an instance of the Particle type, like so:
in [1]: p
out[1]: <__main__.Particle instance at 0x03E1fE68>
In theory, when you work with objects like this you want there to be a "layer of abstraction" between the user and the data such that they don't access or manipulate the data directly. To do this, you create functions (like you tried to do with __getitem__) to mediate interactions between the user and the data through class methods. This is nice, but often not necessary.
In your simpler case, to update the values of these attributes, you can just do it directly the same way we accessed them, with dot notation:
in [2]: p.mass
out[2]: 0
in [3]: p.mass = 2
in [4]: p.mass
out[4]: 2
You might have figured this out already, but there's nothing magical about the __init__ function, or even the class definition (where you would/should generally be defining most of your class's attributes and methods). Certain kinds of objects are pretty permissive about allowing you to add attributes whenever/wherever you want. This can be convenient, but it's generally very hacky and not good practice. I'm not suggesting that you do this, just showing you that it's possible.
in [5]: p.newattr ='foobar!'
in [6]: p.newattr
out[6]: 'foobar!'
Weird right? If this makes your skin crawl... well, maybe it should. But it is possible, and who am I to say what you can and can't do. So that's a taste of how classes work.
class Particle:
def __init__(self, mass, position, velocity, force):
self.mass = mass
self.position = position
self.velocity = velocity
self.force = force
particle = Particle(1, 2, 3, 4)
print(particle.mass) # 1
If you want to pretend your class has properties, you can use the #property decorator:
class Particle:
def __init__(self, mass, position, velocity, force):
self.mass = mass
self.position = position
self.velocity = velocity
self.force = force
#property
def acceleration(self):
return self.force / self.mass
particle = Particle(2, 3, 3, 8)
print(particle.acceleration) # 4.0
Seems like collections.namedtuple is what you're after:
from collections import namedtuple
Particle = namedtuple('Particle', 'mass position velocity force')
p = Particle(1, 2, 3, 4)
print p.velocity
you can just put this class definition ahead before you use it. If you want to declare it, check this site: http://www.diveintopython.net/getting_to_know_python/declaring_functions.html
By the way, your question is similar to this post: Is it possible to forward-declare a function in Python? and also this post: Is it possible to use functions before declaring their body in python?
If you just need to store some attribute values (similar to a C-language struct), you can just do:
class myContainer(object):
pass # Do nothing
myContainerObj = myContainer()
myContainerObj.storedAttrib = 5
print myContainerObj.storedAttrib
In Python 3.7+ there is the data class library. This library will allow you to create your own class to hold data quickly using a decorator, #dataclass.
The #dataclass decorator allows you to quickly define and add functionality to a class you intend to mostly be used to hold data.
A data class for your problem might be implemented as below. I've included type hints and default values which you might also find helpful.
from dataclasses import dataclass
#dataclass
class Particle:
mass: float
position: float
velocity: float = 0.0
force: float = 0.0
Here is a useful article which explains how to use data classes in Python 3.7+ and some other features.