Numpy - group data into sum values - python

Say I have an array of values:
a = np.array([1,5,4,2,4,3,1,2,4])
and three 'sum' values:
b = 10
c = 9
d = 7
Is there a way to group the values in a into groups of sets where the values combine to equal b,c and d? For example:
b: [5,2,3]
c: [4,4,1]
d: [4,2,1]
b: [5,4,1]
c: [2,4,3]
d: [4,2,1]
b: [4,2,4]
c: [5,4]
d: [1,1,2,3]
Note the sum of b,c and d should remain the same (==26). Perhaps this operation already has a name?

Here's a naive implementation using itertools
from itertools import chain, combinations
def group(n, iterable):
s = list(iterable)
return [c for c in chain.from_iterable(combinations(s, r)
for r in range(len(s)+1))
if sum(c) == n]
group(5, range(5))
yields
[(1, 4), (2, 3), (0, 1, 4), (0, 2, 3)]
Note, this probably will be very slow for large lists because we're essentially constructing and filtering through the power set of that list.
You could use this for
sum_vals = [10, 9, 7]
a = [1, 5, 4, 2, 4, 3, 1, 2, 4]
map(lambda x: group(x, a), sum_vals)
and then zip them together.

Related

How does enumerate(zip(*k_fold(dataset, folds))) work?

If we have:
a = ['a', 'aa', 'aaa']
b = ['b', 'bb', 'bbb']
for i, (x, y) in enumerate(zip(a, b)):
print (i, x, y)
then the code prints:
0 a b
1 aa bb
2 aaa bbb
To iterate over all elements of the two lists, they must have the same size.
Now, if we have the following snippet:
for fold, (train_idx, test_idx, val_idx) in enumerate(zip(*k_fold(dataset, folds))):
pass
where len(dataset) = 1000 and folds = 3, then how does the code works in terms of *k_fold(dataset, folds)?
EDIT:
I add the reference of the snippet about which my question is, it is line 31 of this code.
Python's enumerate function
Enumeration is used to iterate through an iterable whilst keeping an integer count of the number of iterations, so:
>>> for number, value in enumerate(["a", "b", "c"]):
... print(number, value)
1 a
2 b
3 c
Python's zip function
The built-in function zip is used to combine two iterables like so:
>>> a = [1, 2]
>>> b = [3, 4]
>>> list(zip(a, b))
[(1, 3), (2, 4)]
When zip is provided with iterables of different length, then it returns a zip object with the length of the shortest iterable. So:
>>> a = [1, 2, 5, 6]
>>> b = [3, 4]
>>> list(zip(a, b))
[(1, 3), (2, 4)]
Python's unpacking operator
Python uses the * to unpack iterables. Looking through the GitHub repository, it seems that k_fold returns a tuple with 3 elements. This is so that they can pass the values that the k_fold function returns into the iterable.
bonus example:
a = [1, 2, 5, 6, 8, 9, 10 , 11]
b = [3, 4, 12, 13 ]
c = [ 14, 15 ]
for i in enumerate(zip(a, b, c)):
print(i)
output:
(0, (1, 3, 14))
(1, (2, 4, 15)) -----> like they are fold, (train_idx, test_idx, val_idx)
not sure about what train_idx, test_idx, val_idx are in the code on github:
train_idx, test_idx val_idx are lists don't know with what they are filled though !

Merge list together based on common values

I have a list
a = [(1,2),(1,3),(4,5),(6,7),(8,7)]
I want to merge the values in the lists in groups so I can get:
b = [(1,2,3),(4,5),(6,7,8)]
The order doesn't matter, but the group based on connectivity matters. Haven't been able to figure out a way to do it, any help is appreciated!
You can use set intersection to test if there's any value in common between two sets, and you can use set union to merge the two sets:
b = []
for p in map(set, a):
for i, s in enumerate(b):
if s & p:
b[i] |= p
break
else:
b.append(p)
b becomes:
[{1, 2, 3}, {4, 5}, {8, 6, 7}]
You can then convert it to your desired list of sorted tuples if you want:
b = [tuple(sorted(s)) for s in b]
b becomes:
[(1, 2, 3), (4, 5), (6, 7, 8)]
some for loops will do the job:
a = [(1,2),(1,3),(4,5),(6,7),(8,7)]
unions = [[i1,i2] for i1,x in enumerate(a) for i2,y in enumerate(a) for z in x if z in y and i2!=i1]
for c in unions:
if c[::-1] in unions: unions.remove(c[::-1])
b = [e for i,e in enumerate(a) if i not in [y for x in unions for y in x]]
for c in unions:b.append(tuple(set(a[c[0]]+a[c[1]])))
print sorted(b)

How to apply range() on subsequent elements of a list [duplicate]

This question already has answers here:
How can I iterate over overlapping (current, next) pairs of values from a list?
(12 answers)
Closed 2 years ago.
Given a list
l = [1, 7, 3, 5]
I want to iterate over all pairs of consecutive list items (1,7), (7,3), (3,5), i.e.
for i in xrange(len(l) - 1):
x = l[i]
y = l[i + 1]
# do something
I would like to do this in a more compact way, like
for x, y in someiterator(l): ...
Is there a way to do do this using builtin Python iterators? I'm sure the itertools module should have a solution, but I just can't figure it out.
Just use zip
>>> l = [1, 7, 3, 5]
>>> for first, second in zip(l, l[1:]):
... print first, second
...
1 7
7 3
3 5
If you use Python 2 (not suggested) you might consider using the izip function in itertools for very long lists where you don't want to create a new list.
import itertools
for first, second in itertools.izip(l, l[1:]):
...
Look at pairwise at itertools recipes: http://docs.python.org/2/library/itertools.html#recipes
Quoting from there:
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return izip(a, b)
A General Version
A general version, that yields tuples of any given positive natural size, may look like that:
def nwise(iterable, n=2):
iters = tee(iterable, n)
for i, it in enumerate(iters):
next(islice(it, i, i), None)
return izip(*iters)
I would create a generic grouper generator, like this
def grouper(input_list, n = 2):
for i in xrange(len(input_list) - (n - 1)):
yield input_list[i:i+n]
Sample run 1
for first, second in grouper([1, 7, 3, 5, 6, 8], 2):
print first, second
Output
1 7
7 3
3 5
5 6
6 8
Sample run 1
for first, second, third in grouper([1, 7, 3, 5, 6, 8], 3):
print first, second, third
Output
1 7 3
7 3 5
3 5 6
5 6 8
Generalizing sberry's approach to nwise with comprehension:
def nwise(lst, k=2):
return list(zip(*[lst[i:] for i in range(k)]))
Eg
nwise(list(range(10)),3)
[(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6,
7), (6, 7, 8), (7, 8, 9)]
A simple means to do this without unnecessary copying is a generator that stores the previous element.
def pairs(iterable):
"""Yield elements pairwise from iterable as (i0, i1), (i1, i2), ..."""
it = iter(iterable)
try:
prev = next(it)
except StopIteration:
return
for item in it:
yield prev, item
prev = item
Unlike index-based solutions, this works on any iterable, including those for which indexing is not supported (e.g. generator) or slow (e.g. collections.deque).
You could use a zip.
>>> list(zip(range(5), range(2, 6)))
[(0, 1), (1, 2), (2, 3), (3, 4), (4, 5)]
Just like a zipper, it creates pairs. So, to to mix your two lists, you get:
>>> l = [1,7,3,5]
>>> list(zip(l[:-1], l[1:]))
[(1, 7), (7, 3), (3, 5)]
Then iterating goes like
for x, y in zip(l[:-1], l[1:]):
pass
If you wanted something inline but not terribly readable here's another solution that makes use of generators. I expect it's also not the best performance wise :-/
Convert list into generator with a tweak to end before the last item:
gen = (x for x in l[:-1])
Convert it into pairs:
[(gen.next(), x) for x in l[1:]]
That's all you need.

Iterate over all pairs of consecutive items in a list [duplicate]

This question already has answers here:
How can I iterate over overlapping (current, next) pairs of values from a list?
(12 answers)
Closed 2 years ago.
Given a list
l = [1, 7, 3, 5]
I want to iterate over all pairs of consecutive list items (1,7), (7,3), (3,5), i.e.
for i in xrange(len(l) - 1):
x = l[i]
y = l[i + 1]
# do something
I would like to do this in a more compact way, like
for x, y in someiterator(l): ...
Is there a way to do do this using builtin Python iterators? I'm sure the itertools module should have a solution, but I just can't figure it out.
Just use zip
>>> l = [1, 7, 3, 5]
>>> for first, second in zip(l, l[1:]):
... print first, second
...
1 7
7 3
3 5
If you use Python 2 (not suggested) you might consider using the izip function in itertools for very long lists where you don't want to create a new list.
import itertools
for first, second in itertools.izip(l, l[1:]):
...
Look at pairwise at itertools recipes: http://docs.python.org/2/library/itertools.html#recipes
Quoting from there:
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return izip(a, b)
A General Version
A general version, that yields tuples of any given positive natural size, may look like that:
def nwise(iterable, n=2):
iters = tee(iterable, n)
for i, it in enumerate(iters):
next(islice(it, i, i), None)
return izip(*iters)
I would create a generic grouper generator, like this
def grouper(input_list, n = 2):
for i in xrange(len(input_list) - (n - 1)):
yield input_list[i:i+n]
Sample run 1
for first, second in grouper([1, 7, 3, 5, 6, 8], 2):
print first, second
Output
1 7
7 3
3 5
5 6
6 8
Sample run 1
for first, second, third in grouper([1, 7, 3, 5, 6, 8], 3):
print first, second, third
Output
1 7 3
7 3 5
3 5 6
5 6 8
Generalizing sberry's approach to nwise with comprehension:
def nwise(lst, k=2):
return list(zip(*[lst[i:] for i in range(k)]))
Eg
nwise(list(range(10)),3)
[(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6,
7), (6, 7, 8), (7, 8, 9)]
A simple means to do this without unnecessary copying is a generator that stores the previous element.
def pairs(iterable):
"""Yield elements pairwise from iterable as (i0, i1), (i1, i2), ..."""
it = iter(iterable)
try:
prev = next(it)
except StopIteration:
return
for item in it:
yield prev, item
prev = item
Unlike index-based solutions, this works on any iterable, including those for which indexing is not supported (e.g. generator) or slow (e.g. collections.deque).
You could use a zip.
>>> list(zip(range(5), range(2, 6)))
[(0, 1), (1, 2), (2, 3), (3, 4), (4, 5)]
Just like a zipper, it creates pairs. So, to to mix your two lists, you get:
>>> l = [1,7,3,5]
>>> list(zip(l[:-1], l[1:]))
[(1, 7), (7, 3), (3, 5)]
Then iterating goes like
for x, y in zip(l[:-1], l[1:]):
pass
If you wanted something inline but not terribly readable here's another solution that makes use of generators. I expect it's also not the best performance wise :-/
Convert list into generator with a tweak to end before the last item:
gen = (x for x in l[:-1])
Convert it into pairs:
[(gen.next(), x) for x in l[1:]]
That's all you need.

Python equivalent of R "split"-function

In R, you could split a vector according to the factors of another vector:
> a <- 1:10
[1] 1 2 3 4 5 6 7 8 9 10
> b <- rep(1:2,5)
[1] 1 2 1 2 1 2 1 2 1 2
> split(a,b)
$`1`
[1] 1 3 5 7 9
$`2`
[1] 2 4 6 8 10
Thus, grouping a list (in terms of python) according to the values of another list (according to the order of the factors).
Is there anything handy in python like that, except from the itertools.groupby approach?
From your example, it looks like each element in b contains the 1-indexed list in which the node will be stored. Python lacks the automatic numeric variables that R seems to have, so we'll return a tuple of lists. If you can do zero-indexed lists, and you only need two lists (i.e., for your R use case, 1 and 2 are the only values, in python they'll be 0 and 1)
>>> a = range(1, 11)
>>> b = [0,1] * 5
>>> split(a, b)
([1, 3, 5, 7, 9], [2, 4, 6, 8, 10])
Then you can use itertools.compress:
def split(x, f):
return list(itertools.compress(x, f)), list(itertools.compress(x, (not i for i in f)))
If you need more general input (multiple numbers), something like the following will return an n-tuple:
def split(x, f):
count = max(f) + 1
return tuple( list(itertools.compress(x, (el == i for el in f))) for i in xrange(count) )
>>> split([1,2,3,4,5,6,7,8,9,10], [0,1,1,0,2,3,4,0,1,2])
([1, 4, 8], [2, 3, 9], [5, 10], [6], [7])
Edit: warning, this a groupby solution, which is not what OP asked for, but it may be of use to someone looking for a less specific way to split the R way in Python.
Here's one way with itertools.
import itertools
# make your sample data
a = range(1,11)
b = zip(*zip(range(len(a)), itertools.cycle((1,2))))[1]
{k: zip(*g)[1] for k, g in itertools.groupby(sorted(zip(b,a)), lambda x: x[0])}
# {1: (1, 3, 5, 7, 9), 2: (2, 4, 6, 8, 10)}
This gives you a dictionary, which is analogous to the named list that you get from R's split.
As a long time R user I was wondering how to do the same thing. It's a very handy function for tabulating vectors. This is what I came up with:
a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]
from collections import defaultdict
def split(x, f):
res = defaultdict(list)
for v, k in zip(x, f):
res[k].append(v)
return res
>>> split(a, b)
defaultdict(list, {1: [1, 3, 5, 7, 9], 2: [2, 4, 6, 8, 10]})
You could try:
a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]
split_1 = [a[k] for k in (i for i,j in enumerate(b) if j == 1)]
split_2 = [a[k] for k in (i for i,j in enumerate(b) if j == 2)]
results in:
In [22]: split_1
Out[22]: [1, 3, 5, 7, 9]
In [24]: split_2
Out[24]: [2, 4, 6, 8, 10]
To make this generalise you can simply iterate over the unique elements in b:
splits = {}
for index in set(b):
splits[index] = [a[k] for k in (i for i,j in enumerate(b) if j == index)]

Categories