I work with a bit of json in my code. I work with it in both Objective-C and python (3 of course!). Someday, I'll probably be forced to work with it in java. All of these platforms have libraries for parsing json strings into native objects, usually dictionaries with stuff in them.
So every time I write a method/function that either produces or consumes json data, I'm always torn. Because sometimes, they consume or produce it in string form, and sometimes they do it in the higher level form.
For example, let's say I have a Script object that reifies some scheduling, and I can turn it into json for easy http transmission or mongo-ification, or whatever. And so I make two methods:
class Script(object):
def toJson(self):
...
def fromJson(self, json):
...
While these methods communicate that the Script object can populate itself or represent itself via json, it's totally unclear which form. Is the json variable there a dict or or a string?
So I'm wondering if others have evolved naming conventions that help clarify this?
I run into a similar issue (althought in my case it's xml). It may not be elegant, but I have a tendency to use the type name when returning that type, and append 'str' when returning a stringified version of the type. Eg:
class Script:
def tojson(self):
...
def tojsonstr(self):
...
def fromjson(self, json):
...
def fromjsonstr(self, jsonstr):
...
I've experimented w/ adding underscores, but in the long run, this just creates cumbersome identifiers (and my fingers get twisted typing longer strings).
I suppose you could get clever, and test the passed object for isinstance (at least in the case of fromX(), Eg:
def fromjson(self, json):
if isinstance(json, basestring):
# assume str
else:
# assume json object
...
Because I do a bit of pymongo as well, where it's common to refer to these structured dictionaries as docs, I began using that term (after discussing this question with a number of peers elsewhere). So I have methods/functions that look like:
def toDoc(self):
...
and
- (void) fromDoc: (NSDictionary*) doc {
...
}
Then, if I want to paper over those with json variants which do the to/from string encoding, I name this with JSON.
I realized in the end, that in JavaScript itself, this doesn't come up, because it's just an object already. But with languages like Python and ObjectiveC we tend to use intermediary literal objects such as dictionaries and lists as we step between our domain objects and JSON. What I really needed was some word that meant "JavaScriptishObjectStructure" or something like that.
Related
In Python, I generate quite often a pickle file to conserve the work I have done during programming.
Is there any possibility to store something like a docstring in the pickle that explains how this pickle was generated and what it's meaning is.
Because you can combine all kinds of items into dictionaries, tuples, and lists before pickling them, I would say the most straightforward solution would be to use a dictionary that has a docstring key.
pickle_dict = {'objs': [some, stuff, inhere],
'docstring': 'explanation of those objects'}
Of course, depending on what you are pickling, you may want key-value pairs for each object instead of a list of objects.
When you open the pickle back up, you can just read the docstring to remember how this pickle came to be.
As an alternative solution, I often just need to save one or two integer values about the pickle. In this case, I choose to save in the title of the pickle file. Depending on what you are doing, this could be preferred so you can read the "docstring" without having to unpickle it.
DataFrames and lists don't typically have docstrings because they are data. The docstring specification says:
A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition. Such a docstring becomes the __doc__ special attribute of that object.
You can create any of these to make a docstring associated with the process that uses your data. The main class of your module for example.
class MyClass:
"""My docstring"""
def __init__(self, df):
self.df = df # Your dataframe
...
Something like this seems like it is closest to what you were asking within the conventions of the language.
I have a function which needs to behave differently depending on the type of the parameter taken in. My first impulse was to include some calls to isinstance, but I keep seeing answers on stackoverflow saying that this is bad form and unpythonic but without much reason why its bad form and unpythonic. For the latter, I suppose it has something to do with duck-typing, but whats the big deal to check if your arguments are of a specific type? Isn't it better to play it safe?
Consult this great post
My opinion on the matter is this:
If you are restricting your code, don't do it.
If you are using it to direct your code, then limit it to very specific cases.
Good Example: (this is okay)
def write_to_file(var, content, close=True):
if isinstance(var, str):
var = open(var, 'w')
var.write(content)
if close:
var.close()
Bad Example: (this is bad)
def write_to_file(var, content, close=True):
if not isinstance(var, file):
raise Exception('expected a file')
var.write(content)
if close:
var.close()
Using isinstance limits the objects which you can pass to your function. For example:
def add(a,b):
if isinstance(a,int) and isinstance(b,int):
return a + b
else:
raise ValueError
Now you might try to call it:
add(1.0,2)
expecting to get 3 but instead you get an error because 1.0 isn't an integer. Clearly, using isinstance here prevented our function from being as useful as it could be. Ultimately, if our objects taste like a duck when we roast them, we don't care what type they were to begin with just as long as they work.
However, there are situations where the opposite is true:
def read(f):
if isinstance(f,basestring):
with open(f) as fin
return fin.read()
else:
return f.read()
The point is, you need to decide the API that you want your function to have. Cases where your function should behave differently based on the type exist, but are rare (checking for strings to open files is one of the more common uses that I know of).
Because doing so explicitly prevents duck-typing.
Here's an example. The csv module allows me to write data to a file in CSV format. For that reason, the function accepts a file as a parameter. But what if I didn't want to write to an actual file, but to something like a StringIO object? That's a perfectly good use of it, since StringIO implements the necessary read and write methods. But if csv was explicitly checking for an actual object of type file, that would be forbidden.
Generally, Python takes the view that we should allow things as much as possible - it's the same reasoning behind the lack of real private variables in classes.
sometimes usage of isinstance just reimplements the polymorphic dispatch. Look at str(...), it calls object.__str__(..) which is implemented with each type individually. By implementing __str__ you can reuse code that depends on str by extending that one object instead of having to manipulate the built in method str(...).
Basically this is the culminating point of OOP. You want polymorphic behaviour, you do not want do spell out types.
There are valid reasons to use it though.
Disclaimer: this is perhaps a quite subjective question with no 'right' answer but I'd appreciate any feedback on best-practices and program design. So here goes:
I am writing a library where text files are read into Text objects. Now these might be initialized with a list of file-names or directly with a list of Sentence objects. I am wondering what the best / most Pythonic way to do this might be because, if I understand correctly, Python doesn't directly support method overloading.
One example I found in Scikit-Learn's feature extraction module simply passes the type of the input as an argument while initializing the object. I assume that once this parameter is set it's just a matter of handling the different cases internally:
if input == 'filename':
# glob and read files
elif input == 'content':
# do something else
While this is easy to implement, it doesn't look like a very elegant solution. So I am wondering if there is a better way to handle multiple types of inputs to initialize a class that I am overlooking.
One way is to just create classmethods with different names for the different ways of instantiating the object:
class Text(object):
def __init__(self, data):
# handle data in whatever "basic" form you need
#classmethod
def fromFiles(cls, files):
# process list of filenames into the form that `__init__` needs
return cls(processed_data)
#classmethod
def fromSentences(cls, sentences):
# process list of Sentence objects into the form that `__init__` needs
return cls(processed_data)
This way you just create one "real" or "canonical" initialization method that accepts whatever "lowest common denominator" format you want. The specialized fromXXX methods can preprocess different types of input to convert them into the form they need to be in to pass to that canonical instantiation. The idea is that you call Text.fromFiles(...) to make a Text from filenames, or Text.fromSentences(...) to make a Text from sentence objects.
It can also be acceptable to do some simple type-checking if you just want to accept one of a few enumerable kinds of input. For instance, it's not uncommon for a class to accept either a filename (as a string) or a file object. In that case you'd do:
def __init__(self, file):
if isinstance(file, basestring):
# If a string filename was passed in, open the file before proceeding
file = open(file)
# Now you can handle file as a file object
This becomes unwieldy if you have many different types of input to handle, but if it's something relatively contained like this (e.g., an object or the string "name" that can be used to get that object), it can be simpler than the first method I showed.
You can use duck typing. First you consider as if the arguments are of the type X, if they raise an exception, then you assume they are of type Y, etc:
class Text(object):
def __init__(self, *init_vals):
try:
fileobjs = [open(fname) for fname in init_vals]
except TypeError:
# Then we consider them as file objects.
fileobjs = init_vals
try:
senteces = [parse_sentences(fobj) for fobj in fileobjs]
except TypeError:
# Then init_vals are Sentence objects.
senteces = fileobjs
Note that the absence of type checking means that the method actually accepts any type that implement one of the interfaces you actually use (e.g. file-like object, Sentence-like object etc.).
This method becomes quite heavy if you want to support a lot of different types, but I'd consider that bad code design. Accepting more than 2,3,4 types as initializers will probably confuse any programmer that uses your class, since he will always have to think "wait, did X also accept Y, or was it Z that accepted Y...".
It's probably better design the constructor to only accept 2,3 different interfaces and provide the user with some function/class that allows him to convert some often used types to these interfaces.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
When writing a python class that have different functions for getting the data, and parsing the data; what is the most correct way?
You can write it so you are populating self.data... one by one, and then running parse functions to populate self.parsed_data.... Or is it correct to write functions that accept self.data and returns self.parsed_data..?
Examples below.
MyClass1 populates self.variables, and MyClass2 takes them as parameters.
I think MyClass2 is "most" correct.
So, what is correct? And why? I have been trying to decide upon which of these two coding styles for a while. But I want to know which of these are considered best practice.
class MyClass1(object):
def __init__(self):
self.raw_data = None
def _parse_data(self):
# This is a fairly complex function xml/json parser
raw_data = self.raw_data
data = raw_data # Much for is done to do something with raw_data
cache.set('cache_key', data, 600) # Cache for 10 minutes
return data
def _populate_data(self):
# This function grabs data from an external source
self.raw_data = 'some raw data, xml, json or alike..'
def get_parsed_data(self):
cached_data = cache.get('cache_key')
if cached_data:
return cached_data
else:
self._populate_data()
return self._parse_data()
mc1 = MyClass1()
print mc1.get_parsed_data()
class MyClass2(object):
def _parse_data(self, raw_data):
# This is a fairly complex function xml/json parser
data = raw_data # After some complicated work of parsing raw_data
cache.set('cache_key', data, 600) # Cache for 10 minutes
return data
def _get_data(self):
# This function grabs data from an external source
return 'some raw data, xml, json or alike..'
def get_parsed_data(self):
cached_data = cache.get('cache_key')
if cached_data:
return cached_data
else:
return self._populate_data(self._get_data())
mc2 = MyClass2()
print mc1.get_parsed_data()
It's down to personal preference, finally. But IMO, it's better to just have a module-level function called parse_data which takes in the raw data, does a bunch of work and returns the parsed data. I assume your cache keys are somehow derived from the raw data, which means the parse_data function can also implement your caching logic.
The reason I prefer a function vs having a full-blown class is the simplicity. If you want to have a class which provides data fields pulled from your raw data, so that users of your objects can do something like obj.some_attr instead of having to look inside some lower-level data construct (e.g. JSON, XML, Python dict, etc.), I would make a simple "value object" class which only contains data fields, and no parsing logic, and have the aforementioned parse_data function return an instance of this class (essentially acting as a factory function for your data class). This leads to less state, simpler objects and no laziness, making your code easier to reason about.
This would also make it easier to unit test consumers of this class, because in those tests, you can simply instantiate the data object with fields, instead of having to provide a big blob of test raw data.
For me the most correct class is the class the user understands and uses with as few errors as possible.
When I look at class 2 I ask myself how would I use it...
mc2 = MyClass2()
print mc1.get_parsed_data()
I would like only
print get_parsed_data()
Sometimes it is better to not write classes at all.
the second way is preferable because (if i understand correctly) it's identical in efficiency and results, but avoids having an instance member for the raw data. in general you want to reduce the amount of data stored inside your objects because each extra attribute means more worrying about consistency over time.
in other words, it's "more functional".
Think about the question this way: if, instead of having two methods, you combined this logic into one long method, would you keep track of the raw data after it is parsed? If the answer to that is yes, then it would make sense to store it as an attribute. But if you don't care about it anymore after that point, prefer the second form. Breaking out parts of your logic into "helper" subroutines should preferably avoid making changes to your class that other methods might need to care about.
I have a programming experience with statically typed languages. Now writing code in Python I feel difficulties with its readability. Lets say I have a class Host:
class Host(object):
def __init__(self, name, network_interface):
self.name = name
self.network_interface = network_interface
I don't understand from this definition, what "network_interface" should be. Is it a string, like "eth0" or is it an instance of a class NetworkInterface? The only way I'm thinking about to solve this is a documenting the code with a "docstring". Something like this:
class Host(object):
''' Attributes:
#name: a string
#network_interface: an instance of class NetworkInterface'''
Or may be there are name conventions for things like that?
Using dynamic languages will teach you something about static languages: all the help you got from the static language that you now miss in the dynamic language, it wasn't all that helpful.
To use your example, in a static language, you'd know that the parameter was a string, and in Python you don't. So in Python you write a docstring. And while you're writing it, you realize you had more to say about it than, "it's a string". You need to say what data is in the string, and what format it should have, and what the default is, and something about error conditions.
And then you realize you should have written all that down for your static language as well. Sure, Java would force you know that it was a string, but there's all these other details that need to be specified, and you have to manually do that work in any language.
The docstring conventions are at PEP 257.
The example there follows this format for specifying arguments, you can add the types if they matter:
def complex(real=0.0, imag=0.0):
"""Form a complex number.
Keyword arguments:
real -- the real part (default 0.0)
imag -- the imaginary part (default 0.0)
"""
if imag == 0.0 and real == 0.0: return complex_zero
...
There was also a rejected PEP for docstrings for attributes ( rather than constructor arguments ).
The most pythonic solution is to document with examples. If possible, state what operations an object must support to be acceptable, rather than a specific type.
class Host(object):
def __init__(self, name, network_interface)
"""Initialise host with given name and network_interface.
network_interface -- must support the same operations as NetworkInterface
>>> network_interface = NetworkInterface()
>>> host = Host("my_host", network_interface)
"""
...
At this point, hook your source up to doctest to make sure your doc examples continue to work in future.
Personally I found very usefull to use pylint to validate my code.
If you follow pylint suggestion almost automatically your code become more readable,
you will improve your python writing skills, respect naming conventions. You can also define your own naming conventions and so on. It's very useful specially for a python beginner.
I suggest you to use.
Python, though not as overtly typed as C or Java, is still typed and will throw exceptions if you're doing things with types that simply do not play nice together.
To that end, if you're concerned about your code being used correctly, maintained correctly, etc. simply use docstrings, comments, or even more explicit variable names to indicate what the type should be.
Even better yet, include code that will allow it to handle whichever type it may be passed as long as it yields a usable result.
One benefit of static typing is that types are a form of documentation. When programming in Python, you can document more flexibly and fluently. Of course in your example you want to say that network_interface should implement NetworkInterface, but in many cases the type is obvious from the context, variable name, or by convention, and in these cases by omitting the obvious you can produce more readable code. Common is to describe the meaning of a parameter and implicitly giving the type.
For example:
def Bar(foo, count):
"""Bar the foo the given number of times."""
...
This describes the function tersely and precisely. What foo and bar mean will be obvious from context, and that count is a (positive) integer is implicit.
For your example, I'd just mention the type in the document string:
"""Create a named host on the given NetworkInterface."""
This is shorter, more readable, and contains more information than a listing of the types.