Python. Best way to "zip" char and list

Python. Best way to "zip" char and list - python

I want to "zip" char and list in Python:
An example:
char = '<'
list = [3, 23, 67]
"zip"(char, list)
>>> [('<', 3), ('<', 23), ('<', 67)]
How I'm using itertools.repeat():
itertools.izip(itertools.repeat(char, len(list)), list)
>>>[('<', 3), ('<', 23), ('<', 67)]
It works, but it so interesting to find more pythonic solution.

You don't need itertools here.
Using list comprehension:
>>> char = '<'
>>> lst = [3, 23, 67]
>>> [(char, n) for n in lst]
[('<', 3), ('<', 23), ('<', 67)]
BTW, don't use list as a variable name. It shadows builtin function list.

[(char, i) for i in list]
Naming your list as "list" is probably not a good idea btw., as this shadows the constructor for the internal list type.

If you want something equivalent to your use of itertools - using lazy generation for iteration - then you can use generator expressions. The syntax is pretty much equivalent to list comprehensions except you enclose the expression with paranthesis.
>>> c = '<'
>>> l = [3, 23, 67]
>>> my_gen = ((c, item) for item in l)
>>> for item in my_gen:
... print item
...
('<', 3)
('<', 23)
('<', 67)
For more info, here's the PEP that explains it: http://www.python.org/dev/peps/pep-0289/

If char is only ever going to be reused for all pairings, just use a list comprehension:
>>> [(char, i) for i in lst]
[('<', 3), ('<', 23), ('<', 67)]
If char is a string of characters, and you wanted to cycle through them when pairing (like zip() would for the shortest length sequence), use itertools.cycle():
>>> from itertools import cycle
>>> chars = 'fizz'
>>> lst = range(6)
>>> zip(chars, lst)
[('f', 0), ('i', 1), ('z', 2), ('z', 3)]
>>> zip(cycle(chars), lst)
[('f', 0), ('i', 1), ('z', 2), ('z', 3), ('f', 4), ('i', 5)]
Note how the characters of the string 'fizz' are reused to pair up with the numbers 4 and 5; they'll continue to be cycled to match any length list (which must be finite).

If you really want to use zip, here is how :
l = [3, 23, 67]
zip('<' * len(l), l)
[('<', 3), ('<', 23), ('<', 67)]
In further details, itertools.repeat(char, len(list)) is quite similar in result to '<' * 3. Also, both work with zip (you could write zip(itertools.repeat(char, len(list)), l)), too).

Related

Functional programming vs list comprehension

Mark Lutz in his book "Learning Python" gives an example:
>>> [(x,y) for x in range(5) if x%2==0 for y in range(5) if y%2==1]
[(0, 1), (0, 3), (2, 1), (2, 3), (4, 1), (4, 3)]
>>>
a bit later he remarks that 'a map and filter equivalent' of this is possible though complex and nested.
The closest one I ended up with is the following:
>>> list(map(lambda x:list(map(lambda y:(y,x),filter(lambda x:x%2==0,range(5)))), filter(lambda x:x%2==1,range(5))))
[[(0, 1), (2, 1), (4, 1)], [(0, 3), (2, 3), (4, 3)]]
>>>
The order of tuples is different and nested list had to be introduced. I'm curious what would be the equivalent.

A note to append to #Kasramvd's explanation.
Readability is important in Python. It's one of the features of the language. Many will consider the list comprehension the only readable way.
Sometimes, however, especially when you are working with multiple iterations of conditions, it is clearer to separate your criteria from logic. In this case, using the functional method may be preferable.
from itertools import product
def even_and_odd(vals):
return (vals[0] % 2 == 0) and (vals[1] %2 == 1)
n = range(5)
res = list(filter(even_and_odd, product(n, n)))

One important point that you have to notice is that your nested list comprehension is of O(n2) order. Meaning that it's looping over a product of two ranges. If you want to use map and filter you have to create all the combinations. You can do that after or before filtering but what ever you do you can't have all those combinations with those two functions, unless you change the ranges and/or modify something else.
One completely functional approach is to use itertools.product() and filter as following:
In [16]: from itertools import product
In [17]: list(filter(lambda x: x[0]%2==0 and x[1]%2==1, product(range(5), range(5))))
Out[17]: [(0, 1), (0, 3), (2, 1), (2, 3), (4, 1), (4, 3)]
Also note that using a nested list comprehension with two iterations is basically more readable than multiple map/filter functions. And regarding the performance using built-in funcitons is faster than list comprehension when your function are merely built-in so that you can assure all of them are performing at C level. When you break teh chain with something like a lambda function which is Python/higher lever operation your code won't be faster than a list comprehension.

I think the only confusing part in the expression [(x, y) for x in range(5) if x % 2 == 0 for y in range(5) if y % 2 == 1] is that there an implicit flatten operation is hidden.
Let's consider the simplified version of the expression first:
def even(x):
return x % 2 == 0
def odd(x):
return not even(x)
c = map(lambda x: map(lambda y: [x, y],
filter(odd, range(5))),
filter(even, range(5)))
print(c)
# i.e. for each even X we have a list of odd Ys:
# [
# [[0, 1], [0, 3]],
# [[2, 1], [2, 3]],
# [[4, 1], [4, 3]]
# ]
However, we need pretty the same but flattened list [(0, 1), (0, 3), (2, 1), (2, 3), (4, 1), (4, 3)].
From the official python docs we can grab the example of flatten function:
from itertools import chain
flattened = list(chain.from_iterable(c)) # we need list() here to unroll an iterator
print(flattened)
Which is basically an equivalent for the following list comprehension expression:
flattened = [x for sublist in c for x in sublist]
print(flattened)
# ... which is basically an equivalent to:
# result = []
# for sublist in c:
# for x in sublist:
# result.append(x)

Range support step argument, so I come up with this solution using itertools.chain.from_iterable to flatten inner list:
from itertools import chain
list(chain.from_iterable(
map(
lambda x:
list(map(lambda y: (x, y), range(1, 5, 2))),
range(0, 5, 2)
)
))
Output:
Out[415]: [(0, 1), (0, 3), (2, 1), (2, 3), (4, 1), (4, 3)]

Sort a list of entries according to a given column and return the list of indices

Given a list of entries (e.g. pieces of data, which are tuples), how to sort the list according to one column (feature, e.g. an int) and return not the entire sorted list of entries, but the list of its original indices (like in the function np.argsort())?
I tried to use the lambda expressions but do not know how to incoporate the indices feature:
list1sorted=sorted(list1, key=lambda x: x[1])

import pandas as pd
lst = [13,6,3,2,1,7,6,8]
othr = [5,2,7,9,2,5,7,10]
df = pd.DataFrame({"list1": lst, "list2": othr})
result = df.sort_values("list1")
here ldf contain two lists, it is sorted by list1 and you can find the indices from index (result.index) of the dataframe.

Not a short answer but if you must
alist = [(1,"b"),(5,"a"),(3,"c")]
index = {}
for i,item in enumerate(alist):
index[item] = i
original_indexes = [index[x] for x in sorted(alist,key=lambda x: x[0])]

my variation:
def sort_index(z):
"""
>>> sort_index([(1,"b"),(5,"a"),(3,"c")])
[0, 2, 1]
"""
number = [a[0] for a in z]
return [x[1] for y in number for x in zip(sorted(number), range(len(z))) if x[0] == y]

In the dark ages it was usual to DSU (Decorate Sort Undecorate) to sort a list of objects according to an arbitrary attribute.
We can revert this pattern, now that we have the key argument, to keep the decoration only...
def argsort(l, field_no):
return (t[0] for t in sorted(enumerate(l), key=lambda x:x[1] [field_no]))
Here the decoration is produced by the usual enumerate, that gives us the index of each item, so we have to sort a list of 2-tuples, the first element being the index and the 2nd the element of the original list, we use the key argument to sort according to a field of the original list, and we trow away the original list element...
In the following, a brief demo of the said approach
In [1]: from random import shuffle
In [2]: l = [(chr(60+i), i) for i in range(10)]
In [3]: shuffle(l); l
Out[3]:
[('#', 4),
('?', 3),
('A', 5),
('<', 0),
('>', 2),
('C', 7),
('E', 9),
('B', 6),
('=', 1),
('D', 8)]
In [4]: def argsort(l, field_no):
...: return (t[0] for t in sorted(enumerate(l), key=lambda x:x[1][field_no]))
...:
In [5]: for i in argsort(l, 1): print(l[i])
('<', 0)
('=', 1)
('>', 2)
('?', 3)
('#', 4)
('A', 5)
('B', 6)
('C', 7)
('D', 8)
('E', 9)
In [6]:
Note that here argsort returns a generator, change return (..) to
return [...] if you need a list.

Removing the parentheses from within a list

I have a function that ends up producing this:
exList=[([('Community Chest', 1), ('Jail', 1)], array([10, 17])), ([('Jail', 1), ('Chance', 1)], array([10, 22]))]
As you can see, it is a list and element i.e.
[('Community Chest', 1), ('Jail', 1)], array([10, 17])
within a tuple.
I've tried removing all parentheses like this:
for element in exList:
temp = ""
for ch in element:
if ch not in SYMBOLS:
temp += ch
results.append(temp)
print(results)
But it causes problems because the above code only works on a tuple, and not a list (I know, it's really confusing).
What I ultimately need, is to remove the outmost parentheses in order to get
this:
exList=[[('Community Chest', 1), ('Jail', 1)], array([10, 17]), [('Jail', 1), ('Chance', 1)], array([10, 22])]
As can you see, I want to remove the outermost parentheses.
Could you guys point me in the right direction?

Use a list comprehension and the itertools module:
print [i for i in itertools.chain(*exList)]]

For you to remove the out side parentheses in:
exList=[([('Community Chest', 1), ('Jail', 1)], array([10, 17])), ([('Jail', 1), ('Chance', 1)], array([10, 22]))]
Simply name a new list and use numpy to make an array out of each element in the list:
newlist=np.array(lists[0])
And do so for every tuple within the list.

working inside the boundaries denoted by 'None' inside a list

I have an input list for example :
mylist = [('a', [(2, 4), (0, 5)]), ('b', [(3, 9), (1, 1)]), ("'", None), ('c', [(1,7), (2, 8)])]
I have to carry out comparisons between the values in the list. The item like (2,4) denotes a range. The 'None' entry in the list is like a boundary. The comparisons take place within the boundaries. So here, comparisons will be between a and b first. The comparison is basically between the values given with different conditions like if they are overlapping or not, since they are ranges. So the ranges change after comparing with the neighbour's ranges.
if within the boundary : (something like "if ! None : then continue")
do the comparisons between the items inside the boundary
if None :
move to next boundary
do the comparisons between the items inside the next boundary
Comparison is with simple rules for example, say comparing (2,4) and (3,9) then these two overlap partially so the common between them is chosen. Hence, the result is (3,4).
I have written the code for all the comparison rules but I tried them without boundaries. But they should be within boundaries. And I could not express the boundaries in code. Thank you.

You can group items by testing their second values is None or not, using itertools.groupby.
>>> import itertools
>>> mylist
[('a', [(2, 4), (0, 5)]), ('b', [(3, 9), (1, 1)]), ("'", None), ('c', [(1, 7), (2, 8)])]
>>> grp = itertools.groupby(mylist, lambda i: i[1] is None)
>>> res = [tuple(i[1]) for i in grp if not i[0]] #use parantheses for faster generator expression.
>>> pprint.pprint(res)
[(('a', [(2, 4), (0, 5)]), ('b', [(3, 9), (1, 1)])),
(('c', [(1, 7), (2, 8)]),)]
Now you can use a simple for loop for comparisions:
for item in res:
#do comparisons

Building on the code from your other questions, if you want to handle each part of res separately and accumulate the results, you can do it like this (using the method from #utdemir's answer):
from operator import itemgetter
print "Original List"
print '\n'.join(str(l) for l in phonemelist)
grp = itertools.groupby(phonemelist, itemgetter(1))
res = [tuple(v) for k, v in grp if k]
print '\n'.join(str(l) for l in res)
newlists = []
# for each part between the markers
for item in res:
# update the ranges and add it to the overall list
newlists.append(update_ranges(item))
print "\n after applying co-articulation rules"
print '\n\n'.join('\n'.join(str(i) for i in l) for l in newlists)

Sort list by nested tuple values

Is there a better way to sort a list by a nested tuple values than writing an itemgetter alternative that extracts the nested tuple value:
def deep_get(*idx):
def g(t):
for i in idx: t = t[i]
return t
return g
>>> l = [((2,1), 1),((1,3), 1),((3,6), 1),((4,5), 2)]
>>> sorted(l, key=deep_get(0,0))
[((1, 3), 1), ((2, 1), 1), ((3, 6), 1), ((4, 5), 2)]
>>> sorted(l, key=deep_get(0,1))
[((2, 1), 1), ((1, 3), 1), ((4, 5), 2), ((3, 6), 1)]
I thought about using compose, but that's not in the standard library:
sorted(l, key=compose(itemgetter(1), itemgetter(0))
Is there something I missed in the libs that would make this code nicer?
The implementation should work reasonably with 100k items.
Context: I would like to sort a dictionary of items that are a histogram. The keys are a tuples (a,b) and the value is the count. In the end the items should be sorted by count descending, a and b. An alternative is to flatten the tuple and use the itemgetter directly but this way a lot of tuples will be generated.

Yes, you could just use a key=lambda x: x[0][1]

Your approach is quite good, given the data structure that you have.
Another approach would be to use another structure.
If you want speed, the de-factor standard NumPy is the way to go. Its job is to efficiently handle large arrays. It even has some nice sorting routines for arrays like yours. Here is how you would write your sort over the counts, and then over (a, b):
>>> arr = numpy.array([((2,1), 1),((1,3), 1),((3,6), 1),((4,5), 2)],
dtype=[('pos', [('a', int), ('b', int)]), ('count', int)])
>>> print numpy.sort(arr, order=['count', 'pos'])
[((1, 3), 1) ((2, 1), 1) ((3, 6), 1) ((4, 5), 2)]
This is very fast (it's implemented in C).
If you want to stick with standard Python, a list containing (count, a, b) tuples would automatically get sorted in the way you want by Python (which uses lexicographic order on tuples).

I compared two similar solutions. The first one uses a simple lambda:
def sort_one(d):
result = d.items()
result.sort(key=lambda x: (-x[1], x[0]))
return result
Note the minus on x[1], because you want the sort to be descending on count.
The second one takes advantage of the fact that sort in Python is stable. First, we sort by (a, b) (ascending). Then we sort by count, descending:
def sort_two(d):
result = d.items()
result.sort()
result.sort(key=itemgetter(1), reverse=True)
return result
The first one is 10-20% faster (both on small and large datasets), and both complete under 0.5sec on my Q6600 (one core used) for 100k items. So avoiding the creation of tuples doesn't seem to help much.

This might be a little faster version of your approach:
l = [((2,1), 1), ((1,3), 1), ((3,6), 1), ((4,5), 2)]
def deep_get(*idx):
def g(t):
return reduce(lambda t, i: t[i], idx, t)
return g
>>> sorted(l, key=deep_get(0,1))
[((2, 1), 1), ((1, 3), 1), ((4, 5), 2), ((3, 6), 1)]
Which could be shortened to:
def deep_get(*idx):
return lambda t: reduce(lambda t, i: t[i], idx, t)
or even just simply written-out:
sorted(l, key=lambda t: reduce(lambda t, i: t[i], (0,1), t))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python. Best way to "zip" char and list - python

You don't need itertools here. Using list comprehension: >>> char = '<' >>> lst = [3, 23, 67] >>> [(char, n) for n in lst] [('<', 3), ('<', 23), ('<', 67)] BTW, don't use list as a variable name. It shadows builtin function list.

[(char, i) for i in list] Naming your list as "list" is probably not a good idea btw., as this shadows the constructor for the internal list type.

If you really want to use zip, here is how : l = [3, 23, 67] zip('<' * len(l), l) [('<', 3), ('<', 23), ('<', 67)] In further details, itertools.repeat(char, len(list)) is quite similar in result to '<' * 3. Also, both work with zip (you could write zip(itertools.repeat(char, len(list)), l)), too).

Related

Functional programming vs list comprehension

Sort a list of entries according to a given column and return the list of indices

Removing the parentheses from within a list

working inside the boundaries denoted by 'None' inside a list

Sort list by nested tuple values

Categories

Resources