List comprehension syntax for pool.startmap_async()

List comprehension syntax for pool.startmap_async() - python

Looking at an example here:
https://www.machinelearningplus.com/python/parallel-processing-python/
There is a function definition which is to be parallelised:
# Step 1: Redefine, to accept `i`, the iteration number
def howmany_within_range2(i, row, minimum, maximum):
"""Returns how many numbers lie within `maximum` and `minimum` in a given `row`"""
count = 0
for n in row:
if minimum <= n <= maximum:
count = count + 1
return (i, count)
The starmap_async example is give as below:
results = pool.starmap_async(howmany_within_range2, [(i, row, 4, 8) for i, row in enumerate(data)]).get()
I am a bit confused by this syntax here, particularly the "i" parameter and how this enumerate syntax works.
Also the apply_asyncy() example uses a pool.join() statement, but the map_async() statement doesn't use one?

Breaking this down a little,
data = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
]
arguments = [(i, row, 4, 8) for i, row in enumerate(data)]
print(arguments)
outputs (formatted)
[
(0, [1, 2, 3], 4, 8),
(1, [4, 5, 6], 4, 8),
(2, [7, 8, 9], 4, 8),
]
which are the tuples howmany_within_range2 will be executed with, i.e.
howmany_within_range2(0, [1, 2, 3], 4, 8)
howmany_within_range2(1, [4, 5, 6], 4, 8)
howmany_within_range2(2, [7, 8, 9], 4, 8)
but in parallel.
enumerate is used here to easily access the row index of the row in the data list; otherwise you'd just get a bunch of results without an easy way to associate them with the original data rows.

Related

Python - How to find the longest sequence in a integer list of neighbor numbers

How I find the longest sub-list in list of number pairs that has the difference of 1.
So "number neighbors" could be element 1 and 2 or element 7 and 6.
If the list is [7, 1, 2, 5, 7, 6, 5, 6, 3, 4, 2, 1, 0],
Then the desired output should be 4. The sub-list would be then [7, 6, 5, 6].
This is what I have for now. The for loop formula is really broken and I don't know how to solve this.
list = [7, 1, 2, 5, 7, 6, 5, 6, 3, 4, 2, 1, 0]
sublist = []
for i in list:
if list[i] - list[i+1] == 1 or list[i] - list[i+1] == -1:
sublist.append(i)
print(sublist)
print(len(sublist))

more-itertools makes it simple:
from more_itertools import split_when
lst = [7, 1, 2, 5, 7, 6, 5, 6, 3, 4, 2, 1, 0]
print(max(split_when(lst, lambda x, y: abs(x - y) != 1), key=len))

Its best to break these types of problems up into their parts
the main problem here is to get all sequential sequences
def get_sequences_iter(an_array):
# start a new sequence with first value (or empty)
sequence = an_array[:1]
# check each index with the value that comes before
for idx in range(1,len(an_array)):
if an_array[idx] - an_array[idx-1] in {1,-1}:
# this is part of a run append it
sequence.append(an_array[idx])
else:
# run is broken
yield sequence
# start a new run
sequence = [an_array[idx]]
#capture final sequence
yield sequence
once you have this you can get all the runs in a list in O(n) time
sequences = get_sequences_iter([7, 1, 2, 5, 7, 6, 5, 6, 3, 4, 2, 1, 0])
for sequence in sequences:
print(sequence)
hopefully with this information you can figure out how to solve your actual question
but just in case
print(max(sequences,key=len))

2D Vectorization of unique values per row with condition

Consider the array and function definition shown:
import numpy as np
a = np.array([[2, 2, 5, 6, 2, 5],
[1, 5, 8, 9, 9, 1],
[0, 4, 2, 3, 7, 9],
[1, 4, 1, 1, 5, 1],
[6, 5, 4, 3, 2, 1],
[3, 6, 3, 6, 3, 6],
[0, 2, 7, 6, 3, 4],
[3, 3, 7, 7, 3, 3]])
def grpCountSize(arr, grpCount, grpSize):
count = [np.unique(row, return_counts=True) for row in arr]
valid = [np.any(np.count_nonzero(row[1] == grpSize) == grpCount) for row in count]
return valid
The point of the function is to return the rows of array a that have exactly grpCount groups of elements that each hold exactly grpSize identical elements.
For example:
# which rows have exactly 1 group that holds exactly 2 identical elements?
out = a[grpCountSize(a, 1, 2)]
As expected, the code outputs out = [[2, 2, 5, 6, 2, 5], [3, 3, 7, 7, 3, 3]].
The 1st output row has exactly 1 group of 2 (ie: 5,5), while the 2nd output row also has exactly 1 group of 2 (ie: 7,7).
Similarly:
# which rows have exactly 2 groups that each hold exactly 3 identical elements?
out = a[grpCountSize(a, 2, 3)]
This produces out = [[3, 6, 3, 6, 3, 6]], because only this row has exactly 2 groups each holding exactly 3 elements (ie: 3,3,3 and 6,6,6)
PROBLEM: My actual arrays have just 6 columns, but they can have many millions of rows. The code works perfectly as intended, but it is VERY SLOW for long arrays. Is there a way to speed this up?

np.unique sorts the array which makes it less efficient for your purpose. Use np.bincount and that way you most likely will save some time(depending on your array shape and values in the array). You also will not need np.any anymore:
def grpCountSize(arr, grpCount, grpSize):
count = [np.bincount(row) for row in arr]
valid = [np.count_nonzero(row == grpSize) == grpCount for row in count]
return valid
Another way that might even save more time is using same number of bins for all rows and create one array:
def grpCountSize(arr, grpCount, grpSize):
m = arr.max()
count = np.stack([np.bincount(row, minlength=m+1) for row in arr])
return (count == grpSize).sum(1)==grpCount
Another yet upgrade is to use vectorized 2D bin count from this post. For example (note that Numba solutions tested in the post above is faster. I just provided the numpy solution for example. You can replace the function with any of the suggested ones in the post linked above):
def grpCountSize(arr, grpCount, grpSize):
count = bincount2D_vectorized(arr)
return (count == grpSize).sum(1)==grpCount
#from the post above
def bincount2D_vectorized(a):
N = a.max()+1
a_offs = a + np.arange(a.shape[0])[:,None]*N
return np.bincount(a_offs.ravel(), minlength=a.shape[0]*N).reshape(-1,N)
output of all solutions above:
a[grpCountSize2(a, 1, 2)]
#array([[2, 2, 5, 6, 2, 5],
# [3, 3, 7, 7, 3, 3]])

Finding Indices for Repeat Sequences in NumPy Array

This is a follow up to a previous question. If I have a NumPy array [0, 1, 2, 2, 3, 4, 2, 2, 5, 5, 6, 5, 5, 2, 2], for each repeat sequence (starting at each index), is there a fast way to to then find all matches of that repeat sequence and return the index for those matches?
Here, the repeat sequences are [2, 2] and [5, 5] (note that the length of the repeat is specified by the user but will be the same length and can be much greater than 2). The repeats can be found at [2, 6, 8, 11, 13] via:
def consec_repeat_starts(a, n):
N = n-1
m = a[:-1]==a[1:]
return np.flatnonzero(np.convolve(m,np.ones(N, dtype=int))==N)-N+1
But for each unique type of repeat sequence (i.e., [2, 2] and [5, 5]) I want to return something like the repeat followed by the indices for where the repeat is located:
[([2, 2], [2, 6, 13]), ([5, 5], [8, 11])]
Update
Additionally, given the repeat sequence, can you return the results from a second array. So, look for [2, 2] and [5, 5] in:
[2, 2, 5, 5, 1, 4, 9, 2, 5, 5, 0, 2, 2, 2]
And the function would return:
[([2, 2], [0, 11, 12]), ([5, 5], [2, 8]))]

Here's a way to do so -
def group_consec(a, n):
idx = consec_repeat_starts(a, n)
b = a[idx]
sidx = b.argsort()
c = b[sidx]
cut_idx = np.flatnonzero(np.r_[True, c[:-1]!=c[1:],True])
idx_s = idx[sidx]
indices = [idx_s[i:j] for (i,j) in zip(cut_idx[:-1],cut_idx[1:])]
return c[cut_idx[:-1]], indices
# Perform lookup in another array, b
n = 2
v_a,indices_a = group_consec(a, n)
v_b,indices_b = group_consec(b, n)
idx = np.searchsorted(v_a, v_b)
idx[idx==len(v_a)] = 0
valid_mask = v_a[idx]==v_b
common_indices = [j for (i,j) in zip(valid_mask,indices_b) if i]
common_val = v_b[valid_mask]
Note that for simplicity and ease of usage, the first output arg off group_consec has the unique values per sequence. If you need them in (val, val,..) format, simply replicate at the end. Similarly, for common_val.

python - iterate over all pairs of consecutive items with gaps

Given a list:
mylist = [1,6,4,9,2]
I would like to return all pairs of consecutive items within a window.
For example, if I want pairs of 3 consecutive items, I could do:
items=3
for x in range(0,len(mylist)-items+1):
print(mylist[x:x+items])
Which outputs:
[1, 6, 4]
[6, 4, 9]
[4, 9, 2]
This assumes the window size is also 3, so it's only scanning 3 indexes at a time.
If I instead want to return all pairs of 3 consecutive items in a window of 4, I would want:
[1, 6, 4]
[1, 6, 9]
[1, 4, 9]
[6, 4, 9]
[6, 4, 2]
[6, 9, 2]
[4, 9, 2]
Is there a simple method to produce these pairs?
Edit to add to Alex's answer below:
I ended up using combinations to identify the indexes, then only selecting the indexes starting with zero, like this:
from itertools import combinations
def colocate(mylist,pairs=4,window=6):
x = list(combinations(range(window),pairs))
y = [z for z in x if z[0]==0]
for item in y:
print(item)

"Combination" is a concept in math related to your question. It does not care about "window of 4" though.
from itertools import combinations
l = [1,6,4,9,2]
for combination in combinations(l, 3):
print(combination)
(1, 6, 4)
(1, 6, 9)
(1, 6, 2)
(1, 4, 9)
(1, 4, 2)
(1, 9, 2)
(6, 4, 9)
(6, 4, 2)
(6, 9, 2)
(4, 9, 2)
I'm curious why do you want to have a window of 4.
Maybe there is a better way to solve your task at hand?

One fairly easy way to do it is to think about it in terms of the index rather than the list items itself. Start with:
import itertools
list(itertools.combinations(range(len(mylist)), 3)
This gets you all the possible index triple combinations in a list with the length of your list. Now you want to filter them to exclude any where the last index is 4 or more away from the first:
list(filter(lambda seq: (seq[-1] - seq[0]) <= 4, itertools.combinations(l, 3)))
This gets you the indeces you want. So now you can get the triples you need based on the indeces:
[[mylist[i] for i in seq] for seq in filter(lambda seq: (seq[-1] - seq[0]) < 4, itertools.combinations(l, 3))]
which produces:
[[1, 6, 4], [1, 6, 9], [1, 4, 9], [6, 4, 9], [6, 4, 2], [6, 9, 2], [4, 9, 2]]

This gets pretty close. There will be some duplicates produced, but that's what set(...) is for at the end... should give you some ideas anyway.
from itertools import combinations, islice, chain
# from itertools examples
def window(seq, n=2):
"Returns a sliding window (of width n) over data from the iterable"
" s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... "
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield result
for elem in it:
result = result[1:] + (elem,)
yield result
mylist = [1,6,4,9,2]
set(chain.from_iterable(combinations(w, 3) for w in window(mylist, 4)))
{(1, 4, 9), (1, 6, 4), (1, 6, 9), (4, 9, 2), (6, 4, 2), (6, 4, 9), (6, 9, 2)}

List of tuples from heirarchical nested lists

Having an outer list of inner elements, each said inner element being
a flat/nested list. Each said inner list has a nesting structure that matches the inner list in the preceding outer cell. Meaning that each primitive value in a list either corresponds to a primitive value or to a list - in the following cell list (recursively applied). Thus, each inner list has a depth that is equal to or exceeds by 1 the depth of the element in the preceding cell.
(Notice that the first cell element can start as a nested list of any depth).
Example of the above:
[
[[1, 2, [3, 4]], 1 ],
[[3, [4, 5], [6, 7]], [5, 4] ],
[[5, [6, 7], [8, 9]], [7, [8, 6]] ],
]
It is desired to unfold the nested lists into a list of tuples, where each value is combined either with the parent value, or the corresponding list element if parent is list (with order being maintained). So for the above example list, the output should be:
[
(1, 3, 5),
(2, 4, 6),
(2, 5, 7),
(3, 6, 8),
(4, 7, 9),
(1, 5, 7),
(1, 4, 8),
(1, 4, 6),
]
Note: this question is an expansion on a previous question here, but unlike the linked question, the desired tuples here are flat.

Ok, how about this:
x = [
[[1, 2, [3, 4]], 1 ],
[[3, [4, 5], [6, 7]], [5, 4] ],
[[5, [6, 7], [8, 9]], [7, [8, 6]] ],
]
from collections import defaultdict
def g(x):
paths = defaultdict(lambda: [])
def calculate_paths(item, counts):
if type(item) is list:
for i, el in enumerate(item):
calculate_paths(el, counts + (i,))
else:
paths[counts].append(item)
def recreate_values(k, initial_len, desired_len):
if len(paths[k]) + initial_len == desired_len:
yield paths[k]
else:
for ks in keysets:
if len(ks) > len(k) and ks[0:len(k)] == k:
for ks1 in recreate_values(ks, initial_len + len(paths[k]), desired_len):
yield paths[k] + ks1
for lst in x:
calculate_paths(lst, (0,))
keysets = sorted(list(paths.keys()))
for k in keysets:
yield from recreate_values(k, 0, len(x))
>>> import pprint
>>> pprint.pprint(list(g(x)))
[[1, 3, 5],
[2, 4, 6],
[2, 5, 7],
[3, 6, 8],
[4, 7, 9],
[1, 5, 7],
[1, 4, 8],
[1, 4, 6]]
Works by creating a "path" for each number in the structure, which is a tuple which identifies how it fits in its particular row.
(Original attempt):
If it's always three levels, then something like this?
def listify(lst):
max_len = max(len(item) if type(item) is list else 1 for item in lst)
yield from zip(*[item if type(item) is list else [item] * max_len for item in lst])
def f():
for i in listify(x):
for j in listify(i):
for k in listify(j):
yield k
>>> list(f())

This is one heck of a problem to solve :-)
I did managed to get the solution for different levels as expected. However, I made one assumption to do that:
The last column of the input is the pointer to other columns
If that is no issue, the following solution will work fine :-)
input = [
[[1, 2, [3, 4]], 1 ],
[[3, [4, 5], [6, 7]], [5, 4] ],
[[5, [6, 7], [8, 9]], [7, [8, 6]] ],
]
def level_flatten(level):
"""
This method compares the elements and their types of last column and
makes changes to other columns accordingly
"""
for k, l in level.items():
size = len(l[-1]) if isinstance(l[-1], list) else 1
# Mostly l[-1] is going to be a list; this is for just in case
elements = l[-1]
for i in range(-1, -len(l)-1, -1):
elem = l[i]
if isinstance(l[i], int):
l[i] = [elem] * size
else:
for j in range(len(elem)):
if not isinstance(elem[j], type(elements[j])):
# For a list found in elements[j], there is a int at l[i][j]
elem[j] = [elem[j]] * len(elements[j])
return level
level = {}
for i in range(len(input[0])):
level[i] = []
for j in input:
level[i].append(j[i])
for k, l in level.items():
for i in range(len(l[-1])):
level = level_flatten(level)
total_flat = []
for item in l:
row = []
for x in item:
if isinstance(x, list):
row += x
else:
row.append(x)
total_flat.append(row)
level[k] = total_flat
output_list = []
for i in range(len(level)):# For maintaining the order
output_list += zip(*level[i])
print output_list
I know this is not a pretty solution and could be optimized further. I am trying to think of a better algorithm than this. Will update if I gets to a better solution :-)

I first tried to solve this using a 2d matrix but turned out it's simpler to iterate over the last row dividing the column segments above it:
def unfold(ldata):
'''
ldata: list of hierarchical lists.
technique: repeatedly flatten bottom row one level at a time, unpacking lists or
adding repeats in the column above at the same time.
convention: n=1 are primitives, n>=2 are lists.
'''
has_lists = True
while has_lists:
has_lists = False
for i, elm in enumerate(ldata[-1]):
if type(elm) is list:
has_lists = True
ldata[-1][i:i+1] = ldata[-1][i] # unpack
for k in range(0, len(ldata)-1): # over corresponding items in above column
if type(ldata[k][i]) is list:
ldata[k][i:i+1] = ldata[k][i] # unpack
else:
ldata[k][i:i+1] = [ldata[k][i]]*len(elm) # add repeats
return list(zip(*ldata))
x = [
[[1, 2, [3, 4]], 1 ],
[[3, [4, 5], [6, 7]], [5, 4] ],
[[5, [6, 7], [8, 9]], [7, [8, 6]] ],
]
from pprint import pprint
pprint(unfold(x))
>>>
[(1, 3, 5),
(2, 4, 6),
(2, 5, 7),
(3, 6, 8),
(4, 7, 9),
(1, 5, 7),
(1, 4, 8),
(1, 4, 6)]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

List comprehension syntax for pool.startmap_async() - python

Related

Python - How to find the longest sequence in a integer list of neighbor numbers

2D Vectorization of unique values per row with condition

Finding Indices for Repeat Sequences in NumPy Array

python - iterate over all pairs of consecutive items with gaps

List of tuples from heirarchical nested lists

Categories

Resources