Are python methods chainable? - python

s = set([1,2,3])
I should be elegant to do the following:
a.update(s).update(s)
I doesn't work as I thought make a contains set([1,2,3,1,2,3,1,2,3])
So I'm wandering that Does Python advocate this chainable practise?

set.update() returns None so you can't chain updates like that
The usual rule in Python is that methods that mutate don't return the object
contrast with methods on immutable objects, which obviously must return a new object such as str.replace() which can be chained

It depends.
Methods that modify an object usually return None so you can't call a sequence of methods like this:
L.append(2).append(3).append(4)
And hope to have the same effect as:
L.append(2)
L.append(3)
L.append(4)
You'll probably get an AttributeError because the first call to append returns None and None does not have an append method.
Methods that creates new object returns that object, so for example:
"some string".replace("a", "b").replace("c", "d")
Is perfectly fine.
Usually immutable objects return new objects, while mutable ones return None but it depends on the method and the specific object.
Anyway it's certainly not a feature of the language itself but only a way to implement the methods. Chainable methods can be implemented in probably any language so the question "are python methods chainable" does not make much sense.
The only reasonable question would be "Are python methods always/forced to be/etc. chainable?", and the answer to this question is "No".
In your example set can only contain unique items, so the result that you show does not make any sense. You probably wanted to use a simple list.

And update method does not return you a set, rather a None value.
So, you cannot invoke another method update in chain on NoneType
So, this will anyways give you error..
a.update(s).update(s)
However, since a Set can contain only unique values. So, even if you separate your update on different lines, you won't get a Set like you want..

Yes, you can chain method calls in Python. As to whether it's good practice, there are a number of libraries out there which advocate using chained calls. For example, SQLAlchemy's tutorial makes extensive use of this style of coding. You frequently encounter code snippets like
session.query(User).filter(User.name.in_(['Edwardo', 'fakeuser'])).all()
A sensible guideline to adopt is to ask yourself whether it'll make the code easier to read and maintain. Always strive to make code readable.

I write a simple example, chainable methods should always return an object ,like self,
class Chain(object):
"""Chain example"""
def __init__(self):
self._content = ''
def update(self, new_content):
"""method of appending content"""
self._content += new_content
return self
def __str__(self):
return self._content
>>> c = Chain()
>>> c.update('str1').update('str2').update('str3')
>>> print c
str1str2str3

Related

How to efficiently remove duplicates from a python list based on equality, not hashes

We've got a list of instances of a class. We effectively want a Set i.e. a group with no repeated elements. Elements of the list which are the same equate, but their hashes are different as they have been instantiated separately. So a==b is True, a is b is False.
Is there a way to vectorise this problem or otherwise make it efficient. The only solutions we can think of involved for loops, and it seems like there might be a more efficient solution.
EDIT: I think its different from the "Elegant ways to support equivalence" as the equivalence works well, its just that Set relies on comparing hashes.
EDIT: The for loop solution would go something like, sort the list, and then iterate over, removing the current value if its the same as the last value
EDIT: To be clear, we don't own this class, we just have instances of it. So we could wrap the instances and implement a more useful hash function, however, this seems like it might be almost as expensive as the for loop approach - could be wrong though
EDIT: sorry if it feels like I'm moving the goalposts a bit here - there isn't a simple val of the object that can be subbed in for a hash, that approach would need to somehow generate UIDs for each different instance.
I assume you are working with a class you created yourself and that you've implemented your own equality method.
It's true that the default hash method inherited from Object returns different values for different instances. From what I have read, it's either based on id() or it's randomized, depending on the Python version.
However, you can easily implement your own __hash__ method to solve this.
How to implement a good __hash__ function in python
__hash__ should return the same value for objects that are equal. It also shouldn't change over the lifetime of the object; generally you only implement it for immutable objects.
This may not be the answer that you want, but it is a clean and easy way to do it. Then you can just create a Set normally.
Maybe this is what you need?
Make the hash a function of the class fields.
Here is a simple example:
class A:
def __init__(self, v):
self.val = v
def __eq__(self, other):
return self.val == other.val
def __hash__(self):
return self.val
def __repr__(self):
return 'A(%s)' % self.val
a = set([A(2), A(3), A(4), A(2), A(10), A(4)])
print(a)
# {A(10), A(2), A(3), A(4)}

Python pseudo-immutable object field

I currently need to partially create a Python object and be able to update it for some time. Although, I must not be able to update it once I used the object as a dictionary key.
Of course there is the solution of marking the fields as private, which is mostly a warning for the programmer, and I will actually go for that solution.
But I stumbled on another solution and I want to know if this could be a good idea, or if it could simply go horribly wrong. Here it is:
class Foo():
def __init__(self, bar):
self._bar = bar
self._has_been_hashed = False
def __hash__(self):
self._has_been_hashed = True
return self._bar.__hash__()
def __eq__(self, other):
return self._bar == other._bar
def __copy__(self):
return Foo(self._bar)
def set_bar(self, bar):
if self.has_been_hashed:
raise FooIsNowImmutable
else:
self._bar = bar
Some testing proved it to work as desired, I can no longer use set_bar once I, say, used my object as a dictionary key.
What do you think? Is it a good idea? Will it turn against me? Is there an easier way? And is it somehow a bad practice?
Doing it that way is a bit fragile, since you never know when something might be used as a dictionary key, or when its hash might be called for some other reason. An object isn't supposed to "know" whether it's being used as a dictionary key. It will be confusing to have code that may raise an exception just because some other code somewhere else put the object in a dictionary.
Following the Python philosophy of "explicit is better than implicit", it would be safer to just give your object a method called .finalize() or .lock() or something, which would set a flag indicating the object is immutable. You could also reverse the exception-raising logic, so that __hash__ raises an exception if the object is not yet locked (rather than mutation raising an exception if the object has been hashed).
You would then call .lock() when you're ready to make the object immutable. It makes more sense to explicitly set it immutable when you're done with whatever mutating you need to do, rather than implicitly assuming that as soon as you use it in a dictionary, you're done mutating it.
You can do that, but I'm not sure I'd recommend it. Why do you need it in a dictionary?
It requires a lot more awareness of the state of the object... think a file object. Would you put one in a dictionary? It has to be opened for a lot of the functions to work, and once it's closed, you can't do them anymore. The user has to be aware in the surrounding code which state the object is in.
For files, that makes sense - after all, you don't normally hold files open across large parts of your program, or if you do, they have very defined init and close codes; something similar has to make sense for your object. Especially if you have some APIs that take the object, but expect an immutable version, and others that take the same object, but expect to change it...
I have used the lock method before, and it works well for complex, read-only objects that you want to initialize once and then make sure no one is messing with. E.G. you load a copy of a (say, English) dictionary from disk... it has to be mutable while you are populating it, but you don't want anyone to accidentally modify it, so locking it is a great idea. I would only use it if it was a one-time lock though - something you are locking and unlocking seems like a recipe for disaster.
There are two solutions IMHO if you just want to create a version you can use in hashable places. First is to explicitly create an immutable copy when you put it in a dictionary - tuple and frozenset are examples of this sort of behaviour... if you want to put a list in a dict, you can't, but you can create a tuple from it first, and that can be hashed. Create a frozen version of your object, then it's very clear by looking at the object type whether it's expected to be mutable or immutable, and so cases where it was used incorrectly are easily seen.
Second, if you really want it to be hashable, but need it to be mutable... that's actually legal, but implemented a little different. It goes back to the idea of hashing... hashing is used both for optimized lookups, and equality.
The first is to ensure you can get objects back... you put something in a dictionary, and it hashes to a value of 4 - goes in slot 4. Then you modify it. Then you go to look it up again, and now it hashes to 9 - there's nothing in slot 9, or worse, a different object, and you're broken.
Second is equality - for things like sets, I need to know if my object is already in there. I can hash, but if you know anything about hashing, you still need to check equality to check for hash collisions.
That doesn't preclude supporting __hash__ and being mutable, but it's unusual. You need to decide for your item what makes it the same, even though it's mutable. What you need to do then is give each object a unique id. Technically, you may be able to get away with id(self), but something like the uuid module is probably a better possibility. The UUID4 (or technically, the hash of the UUID4) is what determines both the hash and equality; two objects that contain the same UUID4 should be the exact same object; two objects that have the exact same data but a different UUID4 would be different object.

What is the interpreter looking for?

I never realized just how poor a programmer I was until I came across this exercise below. I am to write a Python file that allows all of the tests below to pass without error.
I believe the file I write needs to be a class, but I have absolutely no idea what should be in my class. I know what the question is asking, but not how to make classes or to respond to the calls to the class with the appropriate object(s).
Please review the exercise code below, and then see my questions at the end.
File with tests:
import unittest
from allergies import Allergies
class AllergiesTests(unittest.TestCase):
def test_ignore_non_allergen_score_parts(self):
self.assertEqual(['eggs'], Allergies(257).list)
if __name__ == '__main__':
unittest.main()
1) I don't understand the "list" method at the end of this assertion. Is it the the Built-In Python function "list()," or is it an attribute that I need to define in my "Allergies" class?
2) What type of object is "Allergies(257).list"
self.assertEqual(['eggs'], Allergies(257).list)
3) Do I approach this by defining something like the following?
def list(self):
list_of_allergens = ['eggs','pollen','cat hair', 'shellfish']
return list_of_allergens[0]
1) I don't understand the "list" method at the end of this assertion. Is it the the Built-In Python function "list()," or is it an attribute that I need to define in my "Allergies" class?
From the ., you can tell that it's an attribute that you need to define on your Allergies class—or, rather, on each of its instances.*
2) What type of object is "Allergies(257).list"
Well, what is it supposed to compare equal to? ['eggs'] is a list of strings (well, of string). So, unless you're going to create a custom type that likes to compare equal to lists, you need a list.
3) Do I approach this by defining something like the following?
def list(self):
list_of_allergens = ['eggs','pollen','cat hair', 'shellfish']
return ist_of_allergens
No. You're on the wrong track right off the bat. This will make Allergies(257).list into a method. Even if that method returns a list when it's called, the test driver isn't calling it. It has to be a list. (Also, more obviously, ['eggs','pollen','cat hair', 'shellfish'] is not going to compare equal to ['eggs'], and ist_of_allergens isn't the same thing as list_of_allergens.)
So, where is that list going to come from? Well, your class is going to need to assign something to self.list somewhere. And, since the only code from your class that's getting called is your constructor (__new__) and initializer (__init__), that "somewhere" is pretty limited. And you probably haven't learned about __new__ yet, which means you have a choice of one place, which makes it pretty simple.
* Technically, you could use a class attribute here, but that seems less likely to be what they're looking for. For that matter, Allergies doesn't even have to be a class; it could be a function that just defines a new type on the fly, constructs it, and adds list to its dict. But both PEP 8 naming standards and "don't make things more complex for no good reason" both point to wanting a class here.
From how it's used, list is an attribute of the object returned by Allergies, which may be a function that returns an object or simply the call to construct an object of type Allergies. In this last case, the whole thing can be easily implemented as:
class Allergies:
def __init__(self, n):
# probably you should do something more
# interesting with n
if n==257:
self.list=['eggs']
This looks like one of the exercises from exercism.io.
I have completed the exercise, so I know what's involved.
'list' is supposed to be a class attribute of the class Allergies, and is itself an object of type list. At least that's one straight-forward way of dealing with it. I defined it in the __init__ method of the class. In my opinion, it's confusing that they called it 'list', as this clashes with Pythons list type.
snippet from my answer:
class Allergies(object):
allergens = ["eggs", "peanuts",
"shellfish", "strawberries",
"tomatoes", "chocolate",
"pollen","cats"]
def __init__(self, score):
# score_breakdown returns a list
self.list = self.score_breakdown(score) # let the name of this function be a little clue ;)
If I were you I would go and do some Python tutorials. I would start with basics, even if it feels like you are covering ground you already travelled. It's absolutely worth knowing your basics/fundamentals as solidly as possible. For this, I could recommend Udacity or codeacademy.

Python: emulate C-style pass-by-reference for variables

I have a framework with some C-like language. Now I'm re-writing that framework and the language is being replaced with Python.
I need to find appropriate Python replacement for the following code construction:
SomeFunction(&arg1)
What this does is a C-style pass-by-reference so the variable can be changed inside the function call.
My ideas:
just return the value like v = SomeFunction(arg1)
is not so good, because my generic function can have a lot of arguments like SomeFunction(1,2,'qqq','vvv',.... and many more)
and I want to give the user ability to get the value she wants.
Return the collection of all the arguments no matter have they changed or not, like: resulting_list = SomeFunction(1,2,'qqq','vvv',.... and many more) interesting_value = resulting_list[3]
this can be improved by giving names to the values and returning dictionary interesting_value = resulting_list['magic_value1']
It's not good because we have constructions like
DoALotOfStaff( [SomeFunction1(1,2,3,&arg1,'qq',val2),
SomeFunction2(1,&arg2,v1),
AnotherFunction(),
...
], flags1, my_var,... )
And I wouldn't like to load the user with list of list of variables, with names or indexes she(the user) should know. The kind-of-references would be very useful here ...
Final Response
I compiled all the answers with my own ideas and was able to produce the solution. It works.
Usage
SomeFunction(1,12, get.interesting_value)
AnotherFunction(1, get.the_val, 'qq')
Explanation
Anything prepended by get. is kind-of reference, and its value will be filled by the function. There is no need in previous defining of the value.
Limitation - currently I support only numbers and strings, but these are sufficient form my use-case.
Implementation
wrote a Getter class which overrides getattribute and produces any variable on demand
all newly created variables has pointer to their container Getter and support method set(self,value)
when set() is called it checks if the value is int or string and creates object inheriting from int or str accordingly but with addition of the same set() method. With this new object we replace our instance in the Getter container
Thank you everybody. I will mark as "answer" the response which led me on my way, but all of you helped me somehow.
I would say that your best, cleanest, bet would be to construct an object containing the values to be passed and/or modified - this single object can be passed, (and will automatically be passed by reference), in as a single parameter and the members can be modified to return the new values.
This will simplify the code enormously and you can cope with optional parameters, defaults, etc., cleanly.
>>> class C:
... def __init__(self):
... self.a = 1
... self.b = 2
...
>>> c=C
>>> def f(o):
... o.a = 23
...
>>> f(c)
>>> c
<class __main__.C at 0x7f6952c013f8>
>>> c.a
23
>>>
Note
I am sure that you could extend this idea to have a class of parameter that carried immutable and mutable data into your function with fixed member names plus storing the names of the parameters actually passed then on return map the mutable values back into the caller parameter name. This technique could then be wrapped into a decorator.
I have to say that it sounds like a lot of work compared to re-factoring your existing code to a more object oriented design.
This is how Python works already:
def func(arg):
arg += ['bar']
arg = ['foo']
func(arg)
print arg
Here, the change to arg automatically propagates back to the caller.
For this to work, you have to be careful to modify the arguments in place instead of re-binding them to new objects. Consider the following:
def func(arg):
arg = arg + ['bar']
arg = ['foo']
func(arg)
print arg
Here, func rebinds arg to refer to a brand new list and the caller's arg remains unchanged.
Python doesn't come with this sort of thing built in. You could make your own class which provides this behavior, but it will only support a slightly more awkward syntax where the caller would construct an instance of that class (equivalent to a pointer in C) before calling your functions. It's probably not worth it. I'd return a "named tuple" (look it up) instead--I'm not sure any of the other ways are really better, and some of them are more complex.
There is a major inconsistency here. The drawbacks you're describing against the proposed solutions are related to such subtle rules of good design, that your question becomes invalid. The whole problem lies in the fact that your function violates the Single Responsibility Principle and other guidelines related to it (function shouldn't have more than 2-3 arguments, etc.). There is really no smart compromise here:
either you accept one of the proposed solutions (i.e. Steve Barnes's answer concerning your own wrappers or John Zwinck's answer concerning usage of named tuples) and refrain from focusing on good design subtleties (as your whole design is bad anyway at the moment)
or you fix the design. Then your current problem will disappear as you won't have the God Objects/Functions (the name of the function in your example - DoALotOfStuff really speaks for itself) to deal with anymore.

Why should functions always return the same type?

I read somewhere that functions should always return only one type
so the following code is considered as bad code:
def x(foo):
if 'bar' in foo:
return (foo, 'bar')
return None
I guess the better solution would be
def x(foo):
if 'bar' in foo:
return (foo, 'bar')
return ()
Wouldn't it be cheaper memory wise to return a None then to create a new empty tuple or is this time difference too small to notice even in larger projects?
Why should functions return values of a consistent type? To meet the following two rules.
Rule 1 -- a function has a "type" -- inputs mapped to outputs. It must return a consistent type of result, or it isn't a function. It's a mess.
Mathematically, we say some function, F, is a mapping from domain, D, to range, R. F: D -> R. The domain and range form the "type" of the function. The input types and the result type are as essential to the definition of the function as is the name or the body.
Rule 2 -- when you have a "problem" or can't return a proper result, raise an exception.
def x(foo):
if 'bar' in foo:
return (foo, 'bar')
raise Exception( "oh, dear me." )
You can break the above rules, but the cost of long-term maintainability and comprehensibility is astronomical.
"Wouldn't it be cheaper memory wise to return a None?" Wrong question.
The point is not to optimize memory at the cost of clear, readable, obvious code.
It's not so clear that a function must always return objects of a limited type, or that returning None is wrong. For instance, re.search can return a _sre.SRE_Match object or a NoneType object:
import re
match=re.search('a','a')
type(match)
# <type '_sre.SRE_Match'>
match=re.search('a','b')
type(match)
# <type 'NoneType'>
Designed this way, you can test for a match with the idiom
if match:
# do xyz
If the developers had required re.search to return a _sre.SRE_Match object, then
the idiom would have to change to
if match.group(1) is None:
# do xyz
There would not be any major gain by requiring re.search to always return a _sre.SRE_Match object.
So I think how you design the function must depend on the situation and in particular, how you plan to use the function.
Also note that both _sre.SRE_Match and NoneType are instances of object, so in a broad sense they are of the same type. So the rule that "functions should always return only one type" is rather meaningless.
Having said that, there is a beautiful simplicity to functions that return objects which all share the same properties. (Duck typing, not static typing, is the python way!) It can allow you to chain together functions: foo(bar(baz))) and know with certainty the type of object you'll receive at the other end.
This can help you check the correctness of your code. By requiring that a function returns only objects of a certain limited type, there are fewer cases to check. "foo always returns an integer, so as long as an integer is expected everywhere I use foo, I'm golden..."
Best practice in what a function should return varies greatly from language to language, and even between different Python projects.
For Python in general, I agree with the premise that returning None is bad if your function generally returns an iterable, because iterating without testing becomes impossible. Just return an empty iterable in this case, it will still test False if you use Python's standard truth testing:
ret_val = x()
if ret_val:
do_stuff(ret_val)
and still allow you to iterate over it without testing:
for child in x():
do_other_stuff(child)
For functions that are likely to return a single value, I think returning None is perfectly acceptable, just document that this might happen in your docstring.
Here are my thoughts on all that and I'll try to also explain why I think that the accepted answer is mostly incorrect.
First of all programming functions != mathematical functions. The closest you can get to mathematical functions is if you do functional programming but even then there are plenty of examples that say otherwise.
Functions do not have to have input
Functions do not have to have output
Functions do not have to map input to output (because of the previous two bullet points)
A function in terms of programming is to be viewed simply as a block of memory with a start (the function's entry point), a body (empty or otherwise) and exit point (one or multiple depending on the implementation) all of which are there for the purpose of reusing code that you've written. Even if you don't see it a function always "returns" something. This something is actually the address of next statement right after the function call. This is something you will see in all of its glory if you do some really low-level programming with an Assembly language (I dare you to go the extra mile and do some machine code by hand like Linus Torvalds who ever so often mentions this during his seminars and interviews :D). In addition you can also take some input and also spit out some output. That is why
def foo():
pass
is a perfectly correct piece of code.
So why would returning multiple types be bad? Well...It isn't at all unless you abuse it. This is of course a matter of poor programming skills and/or not knowing what the language you're using can do.
Wouldn't it be cheaper memory wise to return a None then to create a new empty tuple or is this time difference too small to notice even in larger projects?
As far as I know - yes, returning a NoneType object would be much cheaper memory-wise. Here is a small experiment (returned values are bytes):
>> sys.getsizeof(None)
16
>> sys.getsizeof(())
48
Based on the type of object you are using as your return value (numeric type, list, dictionary, tuple etc.) Python manages the memory in different ways including the initially reserved storage.
However you have to also consider the code that is around the function call and how it handles whatever your function returns. Do you check for NoneType? Or do you simply check if the returned tuple has length of 0? This propagation of the returned value and its type (NoneType vs. empty tuple in your case) might actually be more tedious to handle and blow up in your face. Don't forget - the code itself is loaded into memory so if handling the NoneType requires too much code (even small pieces of code but in a large quantity) better leave the empty tuple, which will also avoid confusion in the minds of people using your function and forgetting that it actually returns 2 types of values.
Speaking of returning multiple types of value this is the part where I agree with the accepted answer (but only partially) - returning a single type makes the code more maintainable without a doubt. It's much easier to check only for type A then A, B, C, ... etc.
However Python is an object-oriented language and as such inheritance, abstract classes etc. and all that is part of the whole OOP shenanigans comes into play. It can go as far as even generating classes on-the-fly, which I have discovered a few months ago and was stunned (never seen that stuff in C/C++).
Side note: You can read a little bit about metaclasses and dynamic classes in this nice overview article with plenty of examples.
There are in fact multiple design patterns and techniques that wouldn't even exists without the so called polymorphic functions. Below I give you two very popular topics (can't find a better way to summarize both in a single term):
Duck typing - often part of the dynamic typing languages which Python is a representative of
Factory method design pattern - basically it's a function that returns various objects based on the input it receives.
Finally whether your function returns one or multiple types is totally based on the problem you have to solve. Can this polymorphic behaviour be abused? Sure, like everything else.
I personally think it is perfectly fine for a function to return a tuple or None. However, a function should return at most 2 different types and the second one should be a None. A function should never return a string and list for example.
If x is called like this
foo, bar = x(foo)
returning None would result in a
TypeError: 'NoneType' object is not iterable
if 'bar' is not in foo.
Example
def x(foo):
if 'bar' in foo:
return (foo, 'bar')
return None
foo, bar = x(["foo", "bar", "baz"])
print foo, bar
foo, bar = x(["foo", "NOT THERE", "baz"])
print foo, bar
This results in:
['foo', 'bar', 'baz'] bar
Traceback (most recent call last):
File "f.py", line 9, in <module>
foo, bar = x(["foo", "NOT THERE", "baz"])
TypeError: 'NoneType' object is not iterable
Premature optimization is the root of all evil. The minuscule efficiency gains might be important, but not until you've proven that you need them.
Whatever your language: a function is defined once, but tends to be used at any number of places. Having a consistent return type (not to mention documented pre- and postconditions) means you have to spend more effort defining the function, but you simplify the usage of the function enormously. Guess whether the one-time costs tend to outweigh the repeated savings...?

Categories