good practice for string.partition in python - python

Sometime I write code like this:
a,temp,b = s.partition('-')
I just need to pick the first and 3rd elements. temp would never be used. Is there a better way to do this?
In other terms, is there a better way to pick distinct elements to make a new list?
For example, I want to make a new list using the elements 0,1,3,7 from the old list. The
code would be like this:
newlist = [oldlist[0],oldlist[1],oldlist[3],oldlist[7]]
It's pretty ugly, isn't it?

Be careful using
a, _, b = s.partition('-')
sometimes _ is use for internationalization (gettext), so you wouldn't want to accidentally overwrite it.
Usually I would do this for partition rather than creating a variable I don't need
a, b = s.partition('-')[::2]
and this in the general case
from operator import itemgetter
ig0137 = itemgetter(0, 1, 3, 7)
newlist = ig0137(oldlist)
The itemgetter is more efficient than a list comprehension if you are using it in a loop

For the first there's also this alternative:
a, b = s.partition('-')[::2]
For the latter, since there's no clear interval there is no way to do it too clean. But this might suit your needs:
newlist = [oldlist[k] for k in (0, 1, 3, 7)]

You can use Python's extended slicing feature to access a list periodically:
>>> a = range(10)
>>> # Pick every other element in a starting from a[1]
>>> b = a[1::2]
>>> print b
>>> [1, 3, 5, 7, 9]
Negative indexing works as you'd expect:
>>> c = a[-1::-2]
>>> print c
>>> [9, 7, 5, 3, 1]
For your case,
>>> a, b = s.partition('-')[::2]

the common practice in Python to pick 1st and 3rd values is:
a, _, b = s.partition('-')
And to pick specified elements in a list you can do :
newlist = [oldlist[k] for k in (0, 1, 3, 7)]

If you don't need to retain the middle field you can use split (and similarly rsplit) with the optional maxsplit parameter to limit the splits to the first (or last) match of the separator:
a, b = s.split('-', 1)
This avoids a throwaway temporary or additional slicing.
The only caveat is that with split, unlike partition, the original string is returned if the separator is not found. The attempt to unpack will fail as a result. The partition method always returns a 3-tuple.

Related

How to use itertools.combinations_with_replacement by avoiding particular type of combination

How can I use
itertools.combinations_with_replacement
by leaving out some particular type of combinations. In the case
list(combinations_with_replacement([1, 2, 3, 4], 3))
I need to avoid (1,1,1), (2,2,2), (3,3,3), (4,4,4) by leaving all the rest.
Here is a good reference, but I could not find what I need.
You may combine itertools.combinations_with_replacement() with a simple filter() function:
combinations = combinations_with_replacement([1, 2, 3, 4], 3)
filtered = filter(lambda c: len(set(c)) > 1, combinations)
You are in charge of selecting which combinations should be filtered or not. Here using a lambda function: if all elements are the same, discard it.
If the input list is sorted or contains distinct elements, this would also yield the intended result:
combinations = combinations_with_replacement([1, 2, 3, 4], 3)
filtered = (c for c in combinations if c[0] != c[-1]) # Use square brackets if a list is needed
The solution works because in each generated combination tuple, elements are sorted according to their indices in the input list. Therefore if c[0] == c[-1], then for any element e such that index(c[0])<=index(e)<=index(c[-1]), c[0]==e==c[-1] holds according to the constraint of the input.

Why doesn't Python3's range/slice object support containment testing for other ranges/slices?

Python3's range objects support O(1) containment checking for integers (1) (2)
So one can do 15 in range(3, 19, 2) and get the correct answer True
However, it doesn't support containment checking b/w two ranges
a = range(0, 10)
b = range(3, 7)
a in b # False
b in a # False, even though every element in b is also in a
a < b # TypeError: '<' not supported between instances of 'range' and 'range'
It seems that b in a is interpreted as is any element in the range 'a' equal to the object 'b'?
However, since the range cannot contain anything but integers, range(...) in range(...) will always return False. IMHO, such a query should be answered as is every element in range 'b' also in range 'a'? Given that range only stores the start, stop, step and length, this query can also be answered in O(1).
The slice object doesn't help either. It doesn't implement the __contains__ method, and the __lt__ method simply compares two slices as tuples (which makes no sense)
Is there a reason behind the current implementation of these, or is it just a "it happened to be implemented this way" thing?
It looks like the implementation of __contains__ for ranges is range_contains, which just checks if the given element is in the iterable, with a special case for longs.
As you have correctly observed, e in b returns true iff e is an element in b. Any other implementation, such as one that checks if e is a subset of b, would be ambiguous. This is especially problematic in Python, which doesn't require iterables be homogenous, and allows nested lists/tuples/sets/iterables.
Consider a world where your desired behavior was implemented. Now, suppose we have the following list:
my_list = [1, 2, (3, 4), 5]
ele1 = (3, 4)
ele2 = (2, 5)
In this case, what should ele1 in my_list and ele2 in my_list return? Your desired behavior makes it tedious to write code such as
if e in my_iterable:
# ah! e must exist!
my_iterable.remove(e)
A safer, better way is to keep the current behavior, and instead use a different type-sensitive operator to implement subset predicates:
x = set([1])
y = set([1,2])
x < y # True
[1] < y # raises a TypeError
You're confusing 'b' containing 'a' with 'a' being a subset of 'b' - These are two different things.
b containing a means range(0, 10) is inside b. Let's say:
a = [1, 2, 3]
and
b = [1, 2, 3, 4, 5]
a in b is only true if the actual list [1, 2, 3] is in [[1, 2, 3], 4, 5]. So you're actually checking if the list itself is inside the other list, not that all the elements are in the other list.
A list a is a subset of b if all elements of a are inside b. In your example, b is a subset of a, yes, but the actual list b is not IN a.
If you want to do such methods, then it's probably recommended that you use a set data structure
Range objects implement the collections.abc.Sequence, It supports containment tests.
a in b
b in a
In this case, you are searching for Range object a in range b, vice versa. It should be false.

How to refer to next element in a loop?

I've been looking around for this but can't find what I would like to. I'm sure I've seen this done before but I can't seem to find it. Here's an example:
In this case I would like to take the difference of each element in an array,
#Generate sample list
B = [a**2 for a in range(10)]
#Take difference of each element
C = [(b+1)-b for b in B]
the (b+1) is to denote the next element in the array which I don't know how to do and obviously doesn't work, giving the result:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
the result I would like is:
[1, 3, 5, 7, 9, 11, 13, 15, 17]
I understand that this result is shorter than the original array however the reason for this would be to replace ugly expressions such as:
C = [B[i+1]-B[i] for i in range(len(B)-1)]
In this case it really isn't that bad at all, but there are cases that I need to iterate through multiple variables with long expressions and it gets annoying to keep having to write the index in each time. Right now I'm hoping that there is an easy pythonic way to do this that I don't know about...
EDIT: An example of what I mean about having to do this with multiple variables would be:
X = [a for a in range(10)]
Y = [a**2 for a in range(10)]
Z = [a**3 for a in range(10)]
for x,y,z in zip(X,Y,Z):
x + (x+1) + (y-(y+1))/(z-(z+1))
where (x+1),(y+1),(z+1) denote the next element rather than:
for i in range(len(X)):
x[i] + x[i+1] + (y[i]-y[i+1])/(z[i]-z[i+1])
I am using python 2.7.5 btw
re: your edit
zip is still the right solution. You just need to zip together two iterators over the lists, the second of which should be advanced one tick.
from itertools import izip,tee
cur,nxt = tee(izip(X,Y,Z))
next(nxt,None) #advance nxt iterator
for (x1,y1,z1),(x2,y2,z2) in izip(cur,nxt):
print x1 + x2 + (y1-y2)/(z1-z2)
If you don't like the inline next call, you can use islice like #FelixKling mentioned: izip(cur,islice(nxt, 1, None)).
Alternative you can use zip, to create tuples of the current value, next value:
C = [b - a for a, b in zip(B, B[1:])]
I believe zip returns a generator in Python 3. In Python 2, you might want to use izip. And B[1:], you could use islice: islice(B, 1, None).
Maybe you want enumerate. As follows:
C = [B[b+1]-item for b,item in enumerate(B[:-1])]
or simply:
C = [B[b+1]-B[b] for b in range(len(B[:-1]))]
They both work.
Examples
>>> B = [a**2 for a in range(10)]
>>> C = [B[b+1]-item for b,item in enumerate(B[:-1])]
>>> print C
[1, 3, 5, 7, 9, 11, 13, 15, 17]
This is a pretty weird way, but it works!
b = [a**2 for a in range(10)]
print b
print reduce(lambda x, y:len(x) and x[:-1]+[y-x[-1], y] or [y], b, [])
I have created a bunk on CodeBunk so you can run the it too
http://codebunk.com/b/-JJzLIA-KZgASR_3a-I8

Pattern Matching in Python

This question might go closer to pattern matching in image processing.
Is there any way to get a cost function value, applied on different lists, which will return the inter-list proximity? For example,
a = [4, 7, 9]
b = [5, 8, 10]
c = [2, 3]
Now the cost function value for, may be a 2-tuple, (a, b) should be more than (a, c) and (b, c). This can be a huge computational task since there can be many more number of lists and all permutations would blow up the complexity of the problem. So only the set of 2-tuples would work as well.
EDIT:
The list names indicate the type of actions, and elements in them are the time at which corresponding actions occur. What I'm trying to do is to come up with set(s) of actions which have similar occurrence pattern. Since two actions cannot occur at the same time, it's the combination of intra- and inter-list distance.
Thanks in advance!
You're asking a very difficult question. Without allowing the sizes to change there are already several distance measures you could use (Euclidean, Manhattan, etc, check the See Also section for more). The one you need depends on what you think a good measure of the proximity is for whatever these lists represent.
Without knowing what you're trying to do with these lists no-one can define what a good answer would be, let alone how to compute it efficiently.
For comparing two strings or lists you can use the Levenshtein distance (Python implementation from here):
def levenshtein(s1, s2):
l1 = len(s1)
l2 = len(s2)
matrix = [range(l1 + 1)] * (l2 + 1)
for zz in range(l2 + 1):
matrix[zz] = range(zz,zz + l1 + 1)
for zz in range(0,l2):
for sz in range(0,l1):
if s1[sz] == s2[zz]:
matrix[zz+1][sz+1] = min(matrix[zz+1][sz] + 1,
matrix[zz][sz+1] + 1,
matrix[zz][sz])
else:
matrix[zz+1][sz+1] = min(matrix[zz+1][sz] + 1,
matrix[zz][sz+1] + 1,
matrix[zz][sz] + 1)
return matrix[l2][l1]
Using that on your lists:
>>> a = [4, 7, 9]
>>> b = [5, 8, 10]
>>> c = [2, 3]
>>> levenshtein(a,b)
3
>>> levenshtein(b,c)
3
>>> levenshtein(a,c)
3
EDIT: with the added explanation in the comments, you could use sets instead of lists. Since every element of a set is unique, adding an existing element again is a no-op. And you can use the set's isdisjoint method to check that two sets do not contain the same elements, or the intersection method to see which elements they have in common:
In [1]: a = {1,3,5}
In [2]: a.add(3)
In [3]: a
Out[3]: set([1, 3, 5])
In [4]: a.add(4)
In [5]: a
Out[5]: set([1, 3, 4, 5])
In [6]: b = {2,3,7}
In [7]: a.isdisjoint(b)
Out[7]: False
In [8]: a.intersection(b)
Out[8]: set([3])
N.B.: this syntax of creating sets requires at least Python 2.7.
Given the answer you gave to Michael's clarification, you should probably look up "Dynamic Time Warping".
I haven't used http://mlpy.sourceforge.net/ but its blurb says it provides DTW. (Might be a hammer to crack a nut; depends on your use case.)

proper use of list comprehensions - python

Normally, list comprehensions are used to derive a new list from an existing list. Eg:
>>> a = [1, 2, 3, 4, 5]
>>> [i for i in a if i > 2]
[3, 4, 5]
Should we use them to perform other procedures? Eg:
>>> a = [1, 2, 3, 4, 5]
>>> b = []
>>> [b.append(i) for i in a]
[None, None, None, None, None]
>>> print b
[1, 2, 3, 4, 5]
or should I avoid the above and use the following instead?:
for i in a:
b.append(i)
You should indeed avoid using list comprehensions (along with dictionary comprehensions, set comprehensions and generator expressions) for side effects. Apart from the fact that they'd accumulate a bogus list and thus waste memory, it's also confusing. I expect a list comprehension to generate a (meaningful) value, and many would agree. Loops, on the other hand, are clearly a sequence of statements. They are expected to kick off side effects and generate no result value - no surprise.
From python documentation:
List comprehensions provide a concise way to create lists. Common
applications are to make new lists
Perhaps you want to learn more about reduce(), filter() and map() functions.
In the example you give it would make the most sense to do:
b = [i for i in a]
if for some reason you wanted to create b. In general, there is some common sense that must be employed. If using a comprehension makes your code unreadable, don't use it. Otherwise go for it.
Only use list comprehensions if you plan to use the created list. Otherwise you create it just for the GC to throw it again without ever being used.
So instead of [b.append(i) for i in a] you should use a proper for loop:
for i in a:
b.append(i)
Another solution would be through a generator expression:
b += (i for i in a)
However, if you want to append the whole list, you can simply do
b += a
And if you just need to apply a function to the elements before adding them to the list, you can always use map:
b += map(somefunc, a)
b = []
a = [1, 2, 3, 4, 5]
b.extend (a)

Categories