How does enumerate(zip(*k_fold(dataset, folds))) work?

How does enumerate(zip(*k_fold(dataset, folds))) work? - python

If we have:
a = ['a', 'aa', 'aaa']
b = ['b', 'bb', 'bbb']
for i, (x, y) in enumerate(zip(a, b)):
print (i, x, y)
then the code prints:
0 a b
1 aa bb
2 aaa bbb
To iterate over all elements of the two lists, they must have the same size.
Now, if we have the following snippet:
for fold, (train_idx, test_idx, val_idx) in enumerate(zip(*k_fold(dataset, folds))):
pass
where len(dataset) = 1000 and folds = 3, then how does the code works in terms of *k_fold(dataset, folds)?
EDIT:
I add the reference of the snippet about which my question is, it is line 31 of this code.

Python's enumerate function
Enumeration is used to iterate through an iterable whilst keeping an integer count of the number of iterations, so:
>>> for number, value in enumerate(["a", "b", "c"]):
... print(number, value)
1 a
2 b
3 c
Python's zip function
The built-in function zip is used to combine two iterables like so:
>>> a = [1, 2]
>>> b = [3, 4]
>>> list(zip(a, b))
[(1, 3), (2, 4)]
When zip is provided with iterables of different length, then it returns a zip object with the length of the shortest iterable. So:
>>> a = [1, 2, 5, 6]
>>> b = [3, 4]
>>> list(zip(a, b))
[(1, 3), (2, 4)]
Python's unpacking operator
Python uses the * to unpack iterables. Looking through the GitHub repository, it seems that k_fold returns a tuple with 3 elements. This is so that they can pass the values that the k_fold function returns into the iterable.

bonus example:
a = [1, 2, 5, 6, 8, 9, 10 , 11]
b = [3, 4, 12, 13 ]
c = [ 14, 15 ]
for i in enumerate(zip(a, b, c)):
print(i)
output:
(0, (1, 3, 14))
(1, (2, 4, 15)) -----> like they are fold, (train_idx, test_idx, val_idx)
not sure about what train_idx, test_idx, val_idx are in the code on github:
train_idx, test_idx val_idx are lists don't know with what they are filled though !

Related

find common elements in two lists in linear time complexity

I have two unsorted lists of integers without duplicates both of them contain the same elements but not in the same order and I want to find the indices of the common elements between the two lists in lowest time complexity. For example
a = [1, 8, 5, 3, 4]
b = [5, 4, 1, 3, 8]
the output should be :
list1[0] With list2[2]
list1[1] With list2[4]
list1[2] With list2[0]
and so on
I have thought of using set. intersection and then find the index using the 'index' function but I didn't know how to print the output in a right way
this is what I've tried
b = set(list1).intersection(list2)
ina = [list1.index(x) for x in b]
inb = [list2.index(x) for x in b]
print (ina , inb )

To find them in linear time you should use some kind of hashing. The easiest way in Python is to use a dict:
list1 = [1, 8, 5, 3, 4]
list2 = [5, 4, 1, 3, 8]
common = set(list1).intersection(list2)
dict2 = {e: i for i, e in enumerate(list2) if e in common}
result = [(i, dict2[e]) for i, e in enumerate(list1) if e in common]
The result will be
[(0, 2), (1, 4), (2, 0), (3, 3), (4, 1)]
You can use something like this to format and print it:
for i1, i2 in result:
print(f"list1[{i1}] with list2[{i2}]")
you get:
list1[0] with list2[2]
list1[1] with list2[4]
list1[2] with list2[0]
list1[3] with list2[3]
list1[4] with list2[1]

Create a dictionary that maps elements of one list to their indexes. Then update it to have the indexes of the corresponding elements of the other list. Then any element that has two indices is in the intersection.
intersect = {x: [i] for i, x in enumerate(list1)}
for i, x in enumerate(list2):
if x in intersect:
intersect[x].append(i)
for l in intersect.values():
if len(l) == 2:
print(f'list1[{l[0]}] with list2[{l[1]}]')

a = [1, 8, 5, 3, 4]
b = [5, 4, 1, 3, 8]
e2i = {e : i for (i, e) in enumerate(b)}
for i, e in enumerate(a):
if e in e2i:
print('list1[%d] with list2[%d]' % (i, e2i[e]))

Building on the excellent answers here, you can squeeze a little more juice out of the lemon by not bothering to record the indices of a. (Those indices are just 0 through len(a) - 1 anyway and you can add them back later if needed.)
e2i = {e : i for (i, e) in enumerate(b)}
output = [e2i.get(e) for e in enumerate(a)]
output
# [2, 4, 0, 3, 1]
With len(a) == len(b) == 5000 on my machine this code runs a little better than twice as fast as Björn Lindqvist's code (after I modified his code to store the output rather than print it).

what is the most pythonic way to split a 2d array to arrays of each row?

I have a function foo that returns an array with the shape (1000, 2)
how can I split it to two arrays a(1000) and b(1000)
I'm looking for something like this:
a;b = foo()
I'm looking for an answer that can easily generalize to the case in which the shape is (1000, 5) or so.

The zip(*...) idiom transposes a traditional more-dimensional Python list:
x = [[1,2], [3,4], [5,6]]
# get columns
a, b = zip(*x) # zip(*foo())
# a, b = map(list, zip(*x)) # if you prefer lists over tuples
a
# (1, 3, 5)
# get rows
a, b, c = x
a
# [1, 2]

Transpose and unpack?
a, b = foo().T
>>> a, b = np.arange(20).reshape(-1, 2).T
>>> a
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
>>> b
array([ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19])

You can use numpy.hsplit.
x = np.arange(12).reshape((3, 4))
np.hsplit(x, x.shape[1])
This returns a list of subarrays. Note that in the case of a 2d input, the subarrays will be shape (n, 1). Unless you wrap a function around it to squeeze them to 1d:
def split_1d(arr_2d):
"""Split 2d NumPy array on its columns."""
split = np.hsplit(arr_2d, arr_2d.shape[1])
split = [np.squeeze(arr) for arr in split]
return split
a, b, c, d = split_1d(x)
a
# array([0, 4, 8])
d
# array([ 3, 7, 11])

You could just use list comprehensions, e.g.
(a,b)=([i[0] for i in mylist],[i[1] for i in mylist])
To generalise you could use a comprehension within a comprehension:
(a,b,c,d,e)=([row[i] for row in mylist] for i in range(5))

You can do this simply by using zip function like:
def foo(mylist):
return zip(*mylist)
Now call foo with as much dimension as you have in mylist, and it would do the requisite like:
mylist = [[1, 2], [3, 4], [5, 6]]
a, b = foo(mylist)
# a = (1, 3, 5)
# b = (2, 4, 6)

So this is a little nuts, but if you want to assign different letters to each sub-array in your array, and do so for any number of sub-arrays (up to 26 because alphabet), you could do:
import string
letters = list(string.ascii_lowercase) # get all of the lower-case letters
arr_dict = {k: v for k, v in zip(letters, foo())}
or more simply (for the last line):
arr_dict = dict(zip(letters, foo()))
Then you can access each individual element as arr_dict['a'] or arr_dict['b']. This feels a little mad-scientist-ey to me, but I thought it was fun.

Unsort a list back to original sequence

aList = [2, 1, 4, 3, 5]
aList.sort()
=[1, 2, 3, 4, 5]
del aList[2]
=[1, 2, 4, 5]
**unsort the list back to original sequence with '3' deleted**
=[2, 1, 4, 5]
In reality I have a list of tuples that contain (Price, Quantity, Total).
I want to sort the list, allow the user to delete items in the list and
then put it back in the original order minus the deleted items.
One thing to note is that the values in the tuples can repeat in the list,
such as:
aList = [(4.55, 10, 45.5), (4.55, 10, 45.5), (1.99, 3, 5.97), (1.99, 1, 1.99)]

You cannot unsort the list but you could keep the original unsorted index to restore positions.
E.g.
from operator import itemgetter
aList = [(4.55, 10, 45.5), (4.55, 10, 45.5), (1.99, 3, 5.97), (1.99, 1, 1.99)]
# In keyList:
# * every element has a unique id (it also saves the original position in aList)
# * list is sorted by some criteria specific to your records
keyList = sorted(enumerate(aList), key = itemgetter(1))
# User want to delete item 1
for i, (key, record) in enumerate(keyList):
if key == 1:
del keyList[i]
break
# "Unsort" the list
theList = sorted(keyList, key = itemgetter(0))
# We don't need the unique id anymore
result = [record for key, record in theList]
As you can see this works with duplicate values.

Unsorting can be done
This approach is like others - the idea is to keep the original indices to restore the positions. I wanted to add a clearer example on how this is done.
In the example below, we keep track of the original positions of the items in a by associating them with their list index.
>>> a = [4, 3, 2, 1]
>>> b = [(a[i], i) for i in range(len(a))]
>>> b
[(4, 0), (3, 1), (2, 2), (1, 3)]
b serves as a mapping between the list values and their indices in the unsorted list.
Now we can sort b. Below, each item of b is sorted by the first tuple member, which is the corresponding value in the original list.
>>> c = sorted(b)
>>> c
[(1, 3), (2, 2), (3, 1), (4, 0)]
There it is... sorted.
Going back to the original order requires another sort, except using the second tuple item as the key.
>>> d = sorted(c, key=lambda t: t[1])
>>> d
[(4, 0), (3, 1), (2, 2), (1, 3)]
>>>
>>> d == b
True
And now it's back in its original order.
One use for this could be to transform a list of non sequential values into their ordinal values while maintaining the list order. For instance, a sequence like [1034 343 5 72 8997] could be transformed to [3, 2, 0, 1, 4].
>>> # Example for converting a list of non-contiguous
>>> # values in a list into their relative ordinal values.
>>>
>>> def ordinalize(a):
... idxs = list(range(len(a)))
... b = [(a[i], i) for i in idxs]
... b.sort()
... c = [(*b[i], i) for i in idxs]
... c.sort(key=lambda item: item[1])
... return [c[i][2] for i in idxs]
...
>>> ordinalize([58, 42, 37, 25, 10])
[4, 3, 2, 1, 0]
Same operation
>>> def ordinalize(a):
... idxs = range(len(a))
... a = sorted((a[i], i) for i in idxs)
... a = sorted(((*a[i], i) for i in idxs),
... key=lambda item: item[1])
... return [a[i][2] for i in idxs]

You can't really do an "unsort", the best you can do is:
aList = [2, 1, 4, 3, 5]
aList.remove(sorted(aList)[2])
>>> print aList
[2, 1, 4, 5]

Try this to unsort a sorted list
import random
li = list(range(101))
random.shuffle(li)

Here's how I recommend to sort a list, do something, then unsort back to the original ordering:
# argsort is the inverse of argsort, so we use that
# for undoing the sorting.
sorter = np.argsort(keys)
unsorter = np.argsort(sorter)
sorted_keys = np.array(keys)[sorter]
result = do_a_thing_that_preserves_order(sorted_keys)
unsorted_result = np.array(result)[unsorter]

I had the same use case and I found an easy solution for that, which is basically random the list:
import random
sorted_list = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k']
unsorted_list = random.sample(sorted_list, len(sorted_list))

Python equivalent of R "split"-function

In R, you could split a vector according to the factors of another vector:
> a <- 1:10
[1] 1 2 3 4 5 6 7 8 9 10
> b <- rep(1:2,5)
[1] 1 2 1 2 1 2 1 2 1 2
> split(a,b)
$`1`
[1] 1 3 5 7 9
$`2`
[1] 2 4 6 8 10
Thus, grouping a list (in terms of python) according to the values of another list (according to the order of the factors).
Is there anything handy in python like that, except from the itertools.groupby approach?

From your example, it looks like each element in b contains the 1-indexed list in which the node will be stored. Python lacks the automatic numeric variables that R seems to have, so we'll return a tuple of lists. If you can do zero-indexed lists, and you only need two lists (i.e., for your R use case, 1 and 2 are the only values, in python they'll be 0 and 1)
>>> a = range(1, 11)
>>> b = [0,1] * 5
>>> split(a, b)
([1, 3, 5, 7, 9], [2, 4, 6, 8, 10])
Then you can use itertools.compress:
def split(x, f):
return list(itertools.compress(x, f)), list(itertools.compress(x, (not i for i in f)))
If you need more general input (multiple numbers), something like the following will return an n-tuple:
def split(x, f):
count = max(f) + 1
return tuple( list(itertools.compress(x, (el == i for el in f))) for i in xrange(count) )
>>> split([1,2,3,4,5,6,7,8,9,10], [0,1,1,0,2,3,4,0,1,2])
([1, 4, 8], [2, 3, 9], [5, 10], [6], [7])

Edit: warning, this a groupby solution, which is not what OP asked for, but it may be of use to someone looking for a less specific way to split the R way in Python.
Here's one way with itertools.
import itertools
# make your sample data
a = range(1,11)
b = zip(*zip(range(len(a)), itertools.cycle((1,2))))[1]
{k: zip(*g)[1] for k, g in itertools.groupby(sorted(zip(b,a)), lambda x: x[0])}
# {1: (1, 3, 5, 7, 9), 2: (2, 4, 6, 8, 10)}
This gives you a dictionary, which is analogous to the named list that you get from R's split.

As a long time R user I was wondering how to do the same thing. It's a very handy function for tabulating vectors. This is what I came up with:
a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]
from collections import defaultdict
def split(x, f):
res = defaultdict(list)
for v, k in zip(x, f):
res[k].append(v)
return res
>>> split(a, b)
defaultdict(list, {1: [1, 3, 5, 7, 9], 2: [2, 4, 6, 8, 10]})

You could try:
a = [1,2,3,4,5,6,7,8,9,10]
b = [1,2,1,2,1,2,1,2,1,2]
split_1 = [a[k] for k in (i for i,j in enumerate(b) if j == 1)]
split_2 = [a[k] for k in (i for i,j in enumerate(b) if j == 2)]
results in:
In [22]: split_1
Out[22]: [1, 3, 5, 7, 9]
In [24]: split_2
Out[24]: [2, 4, 6, 8, 10]
To make this generalise you can simply iterate over the unique elements in b:
splits = {}
for index in set(b):
splits[index] = [a[k] for k in (i for i,j in enumerate(b) if j == index)]

Numpy - group data into sum values

Say I have an array of values:
a = np.array([1,5,4,2,4,3,1,2,4])
and three 'sum' values:
b = 10
c = 9
d = 7
Is there a way to group the values in a into groups of sets where the values combine to equal b,c and d? For example:
b: [5,2,3]
c: [4,4,1]
d: [4,2,1]
b: [5,4,1]
c: [2,4,3]
d: [4,2,1]
b: [4,2,4]
c: [5,4]
d: [1,1,2,3]
Note the sum of b,c and d should remain the same (==26). Perhaps this operation already has a name?

Here's a naive implementation using itertools
from itertools import chain, combinations
def group(n, iterable):
s = list(iterable)
return [c for c in chain.from_iterable(combinations(s, r)
for r in range(len(s)+1))
if sum(c) == n]
group(5, range(5))
yields
[(1, 4), (2, 3), (0, 1, 4), (0, 2, 3)]
Note, this probably will be very slow for large lists because we're essentially constructing and filtering through the power set of that list.
You could use this for
sum_vals = [10, 9, 7]
a = [1, 5, 4, 2, 4, 3, 1, 2, 4]
map(lambda x: group(x, a), sum_vals)
and then zip them together.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How does enumerate(zip(*k_fold(dataset, folds))) work? - python

Related

find common elements in two lists in linear time complexity

what is the most pythonic way to split a 2d array to arrays of each row?

Unsort a list back to original sequence

Python equivalent of R "split"-function

Numpy - group data into sum values

Categories

Resources