Shared Memory for Python Classes - python

So as I was learning about classes in Python I was taught that the class level attributes are shared among all instances of a given class. Something I don't think I've seen any other language before.
As a result I could have multiple instances of say a DB class pulling data DB data and dumping it into a class level attribute. Then any instance that needed any of that data would have access to it without having to go to a cache or saved file to
get it.
Currently I'm debugging an analytics class that grabs DB data via an inefficient means - one I'm currently trying to make much faster. Right now it takes several minutes for the DB data to load. And the format I've chosen for the data with ndarrays and such doesn't want to save to a file via numpy.save (I don't remember the error right now). Every time I make a little tweak the data is lost and I have to wait several minutes for it to reload.
So the thought occurred to me that I could create a simple class for holding that data. A class I wouldn't need to alter that could run under a separate iPython console (I'm using Anaconda, Python 2.7, and Spyder). That way I could link the analytics class to the shared data class in the init of the analytics class. Something like this:
def __init__(self):
self.__shared_data = SharedData()
self.__analytics_data_1 = self.__shared_data['analytics_data_1']
The idea would be that I would then write to self.__analytics_data_1 inside the analytics class methods. It would be automatically get updated to the shard data class. I would have an iPython console open to do nothing more than hold an instance of the shared data class during debug. That way when I have to reload and reinstantiate the analytics class it just gets whatever data I've already captured. Obviously, if there is an issue with the data itself then it will need to be removed manually.
Obviously, I wouldn't want to use the same shared data class for every tool I built. But that's simple to get around. The reason I mention all of this is that I couldn't find a recipe for something like this online. There seem to be plenty of recipes out there so I'm thinking the lack of a recipe might be a sign this is a bad idea.
I have attempted to achieve something similar via Memcache on a PHP project. However, in that project I had a lot of reads and write and it appeared that the code was causing some form of write collision. So data wasn't getting updated. (And Memcache doesn't guarantee the data will even be there.) Preventing this write collision meant a lot of extra code, extra processing time, and ultimately the code got to be too slow to be useful. I thought I might attempt it again with this shared memory in Python as well as using the shared memory for debugging purposes.
Thoughts? Warnings?

If you wanna shared data across your class instances, you should do:
class SharedDataClass(object):
shared_data = SharedData()
These SharedData instance will be shared between your class instances unless you override in in the constructor.
Careful, GIL lock may affect performance!
Sharing data "between consoles" or processes is different story. They're separate processes, and, of course, class attributes are not shared. In this case, you need IPC (it could be filesystem, database, process with open socket to connect to, etc.)

Related

OOP: Using conditional statement while initializing a class

This question geared toward OOP best practices.
Background:
I've created a set of scripts that are either automatically triggered by cronjobs or are constantly running in the background to collect data in real time. In the past, I've used Python's smtplib to send myself notifications when errors occur or a job is successfully completed. Recently, I migrated these programs to the Google Cloud platform which by default blocks popular SMTP ports. To get around this I used linux's mail command to continue sending myself the reports.
Originally, my hacky solution was to have two separate modules for sending alerts that were initiated based on an argument I passed to the main script.
Ex:
$ python mycode.py my_arg
if sys.argv[1] == 'my_arg':
mailer = Class1()
else:
mailer = Class2()
I want to improve upon this and create a module that automatically handles this without the added code. The question I have is whether it is "proper" to include a conditional statement while initializing the class to handle the situation.
Ex:
Class Alert(object):
def __init__(self, sys.platform, other_args):
# Google Cloud Platform
if sys.platform == "linux":
#instantiate Class1 variables and methods
#local copy
else:
#instantiate Class2 variables and methods
My gut instinct says this is wrong but I'm not sure what the proper approach would be.
I'm mostly interested in answers regarding how to create OO classes/modules that handle environmental dependencies to provide the same service. In my case, a blocked port requires a different set of code altogether.
Edit: After some suggestions here are my favorite readings on this topic.
http://python-3-patterns-idioms-test.readthedocs.io/en/latest/Factory.html
This seems like a wonderful use-case for a factory class, which encapsulates the conditional, and always returns an instance of one of N classes, all of which implement the same interface, so that the rest of your code can use it without caring about the concrete class being used.
This is a way to do it. But I would rather use something like creating a dynamic class instance. To do that, you could have only one class instead of selecting from two different classes. The class would then take some arguments and return the result depending the on the arguments provided. There are quite some examples out there and I'm sure you can use them in your use-case. Try searching for how to create a dynamic class in python.

How are the objects in the classic single responsibility principle example supposed to communicate?

I'm facing potentially a refactoring project at work and am having a little bit of trouble grasping the single responsibility principle example that shows up on most websites. It is the one regarding separating the connection and send/receive methods of a modem into two different objects. The project is in Python, by the way, but I don't think it is a language-specific issue.
Currently I'm working to break up a 1300 line web service driver class that somebody created (arbitrarily split into two classes but they are essentially one). On the level of responsibility I understand I need to break the connectivity, configuration, and XML manipulation responsibilities into separate classes. Right now all is handled by the class using string manipulations and the httplib.HTTPConnection object to handle the request.
So according to this example I would have a class to handle only the http connection, and a class to transfer that data across that connection, but how would these communicate? If I require a connection to be passed in when constructing the data transfer class, does that re-couple the classes? I'm just having trouble grasping how this transfer class actually accesses the connection that has been made.
With a class that huge (> 1000 lines of code) you have more to worry about than only the SRP or the DIP. I have (or "I fight") classes of similar size and from my experience you have to make unit tests where possible. Carefully refactor (very carefully!) Automatic testing is your friend - be it unit testing as mentioned or regression testing, integration testing, acceptance testing, or whatever you are able to automatically execute. Then refactor. And then run the tests. Refactor again. Test. Refactor. Test.
There is a very good book that describes this process: Michael Feather's "Working Effectively With Legacy Code". Read it.
For example, draw a picture that shows the dependencies of all methods and members of this class. That might help you to identify different "areas" of repsonsibility.

Pythonic Django object reuse

I've been racking my brain on this for the last few weeks and I just can't seem to understand it. I'm hoping you folks here can give me some clarity.
A LITTLE BACKGROUND
I've built an API to help serve a large website and like all of us, are trying to keep the API as efficient as possible. Part of this efficiency is to NOT create an object that contains custom business logic over and over again (Example: a service class) as requests are made. To give some personal background I come from the Java world so I'm use to using a IoC or DI to help handle object creation and injection into my classes to ensure classes are NOT created over and over on a per request basis.
WHAT I'VE READ
While looking at many Python IoC and DI posts I've become rather confused on how to best approach creating a given class and not having to worry about the server getting overloaded with too many objects based on the amount of requests it may be handling.
Some people say an IoC or DI really isn't needed. But as I run my Django app I find that unless I construct the object I want globally (top of file) for views.py to use later rather than within each view class or def within views.py I run the change of creating multiple classes of the same type, which from what I understand would cause memory bloat on the server.
So what's the right way to be pythonic to keep objects from being built over and over? Should I invest in using an IoC / DI or not? Can I safely rely on setting up my service.py files to just contain def's instead of classes that contain def's? Is the garbage collector just THAT efficient so I don't even have to worry about it.
I've purposely not placed any code in this post since this seems like a general questions, but I can provide a few code examples if that helps.
Thanks
From a confused engineer that wants to be as pythonic as possible
You come from a background where everything needs to be a class, I've programmed web apps in Java too, and sometimes it's harder to unlearn old things than to learn new things, I understand.
In Python / Django you wouldn't make anything a class unless you need many instances and need to keep state.
For a service that's hardly the case, and sometimes you'll notice in Java-like web apps some services are made singletons, which is just a workaround and a rather big anti-pattern in Python
Pythonic
Python is flexible enough so that a "services class" isn't required, you'd just have a Python module (e.g. services.py) with a number of functions, emphasis on being a function that takes in something, returns something, in a completely stateless fashion.
# services.py
# this is a module, doesn't keep any state within,
# it may read and write to the DB, do some processing etc but doesn't remember things
def get_scores(student_id):
return Score.objects.filter(student=student_id)
# views.py
# receives HTTP requests
def view_scores(request, student_id):
scores = services.get_scores(student_id)
# e.g. use the scores queryset in a template return HTML page
Notice how if you need to swap out the service, you'll just be swapping out a single Python module (just a file really), so Pythonistas hardly bother with explicit interfaces and other abstractions.
Memory
Now per each "django worker process", you'd have that one services module, that is used over and over for all requests that come in, and when the Score queryset is used and no longer pointed at in memory, it'll be cleaned up.
I saw your other post, and well, instantiating a ScoreService object for each request, or keeping an instance of it in the global scope is just unnecessary, the above example does the job with one module in memory, and doesn't need us to be smart about it.
And if you did need to keep state in-between several requests, keeping them in online instances of ScoreService would be a bad idea anyway because now every user might need one instance, that's not viable (too many online objects keeping context). Not to mention that instance is only accessible from the same process unless you have some sharing mechanisms in place.
Keep state in a datastore
In case you want to keep state in-between requests, you'd keep the state in a datastore, and when the request comes in, you hit the services module again to get the context back from the datastore, pick up where you left it and do your business, return your HTTP response, then unused things will get garbage collected.
The emphasis being on keeping things stateless, where any given HTTP request can be processed on any given django process, and all state objects are garbage collected after the response is returned and objects go out of scope.
This may not be the fastest request/response cycle we can pull, but it's scalable as hell
Look at some major web apps written in Django
I suggest you look at some open source Django projects and look at how they're organized, you'll see a lot of the things you're busting your brains with, Djangonauts just don't bother with.

Getting and serializing the state of dynamically created python instances to a relational model

I'm developing a framework of sorts. I'm providing a base class, that will be subclassed by other developers to add behavior to the system. The instances of those classes will have attributes that my framework doesn't necessarily expect, except by inspecting those instances' __dict__. To make things even more interesting, some of those classes can be created dynamically, at any time.
I'd like some things to be handled by the framework, namely, I will need to persist those instances, display their attribute values to the user, and let her search/filter instances using those values.
I have to use a relational database. I know there are some decent python OO database out there, but unfortunately they're not an option in this case.
I'm not looking for a full-blown ORM too... and it may not even be an option, given that some of the classes can be created dynamically.
So, my question is, what state of a python instance do I need to serialize to ensure that I can deserialize it later on? Is it enough to look at __dict__, or are there other private attributes that I should be using?
Pickling the instances is not enough, because I'll need to unpickle them to search/filter the attribute values, and I'm afraid it's too much data to do it in-memory (instead of letting the database do it).
Just use an ORM. This is what they are for.
What you are proposing to do is create your own half-assed ORM on your own time. Save your time for your own code that does things, and use the effort other people put for free into solving this problem for you.
Note that all class creation in python is "dynamic" - this is not an issue, for, well, anything at all. In fact, if you are assembling classes programmatically, it is probably slightly easier with an ORM, because they provide reifications of fields.
In the worst case, if you really do need to store your objects in a fake nosql-type schema, you will still only have to write your own backend driver if you use an existing ORM, rather than coding the whole stack yourself. (As it happens, you're not the first person to face this - solutions exist. Goole "python orm store dynamically created models" and "sqlalchemy store dynamically created models")
Candidates include:
Django ORM
SQLAlchemy
Some others you can find by googling "Python ORM".

Are verbose __init__ methods in Python bad?

I have a program that I am writing in Python that does the following:
The user enters the name of a folder. Inside that folder a 8-15 .dat files with different extensions.
The program opens those dat files, enters them into a SQL database and then allows the user to select different changes made to the database. Then the database is exported back to the .dat files. There are about 5-10 different operations that could be performed.
The way that I had planned on designing this was to create a standard class for each group of files. The user would enter the name of the folder and an object with certain attributes (file names, dictionary of files, version of files (there are different versions), etc) would get created. Determining these attributes requires opening a few of these files, reading file names, etc.
Should this action be carried out in the __init__ method? Or should this action be carried our in different instance methods that get called in the __init__ method? Or should these methods be somewhere else, and only be called when the attribute is required elsewhere in the program?
I have already written this program in Java. And I had a constructor that called other methods in the class to set the object's attributes. But I was wondering what standard practice in Python would be.
Well, there is nothing special about good OOP practices in Python. Decomposition of one big method into a bunch of small ones is great idea both in Java and in Python. Among other things small methods gives you an opportunity to write different constructors:
class GroupDescriptor(object):
def __init__(self, file_dictionary):
self.file_dict = file_dictionary
self.load_something(self.file_dict['file_with_some_info'])
#classmethod
def from_filelist(cls, list_of_files):
file_dict = cls.get_file_dict(list_of_files)
return cls(file_dict)
#classmethod
def from_dirpath(cls, directory_path):
files = self.list_dir(directory_path)
return cls.from_filelist(files)
Besides, I don't know how it is in Java but in Python you don't have to worry about exceptions in constructor because they are finely handled. Therefore, it is totally normal to work with such exception-prone things like files.
It looks the action you are describing are initialization, so it'd be perfectly ok to put them into __init__. On the other hand, these actions seem to be pretty expensive, and probably useful in the other part of a program, so you might want to encapsulate them in some separate function.
There's no problem with having a long __init__ method, but I would avoid it simply because its more difficult to test. My approach would be to create smaller methods which are called from __init__. This way you can test them and the initialization separately.
Whether they should be called when needed or run up front really depends on what you need them to do. If they are expensive operations, and are usually not all needed, then maybe its better to only call them when needed. On the other hand, you might want to run them up front so that there is no lag when the attributes are required.
Its not clear from your question whether you actually need a class though. I have no experience with Java, but I understand that everything in it is a class. In python it is perfectly acceptable to just have a function if that's all that's required, and to only create classes when you need instances and other classy things.
The __init__ method is called when the object is instantiated.
Coming from a C++ background I believe its not good to do actual work other than initialization in the constructor.

Categories