Delete last N elements if they are 0 and constant - python

I have an array such as
data = [
[1, 0],
[2, 0],
[3, 1],
[4, 1],
[5, 1],
[6, 0],
[7, 0]]
and I want the result to be
verified_data = [[1, 0], [2, 0], [3, 1]]
So how can I remove the last elements if they are 0, and also if last N elements are same (except the first 1). What is the proper way to achieve this? Use of numpy is also fine.
Editing as I have written a solution even if it looks ugly:
def verify_data(data):
rev_data = reversed(data)
for i, row in list(enumerate(rev_data )):
if row[1] == 0:
del data[- 1]
else:
break
rev_data = reversed(data)
last_same_data = None
for i, row in list(enumerate(rev_data)):
if not last_same_data:
last_same_data = row[1]
continue
if last_same_data == row[1]:
del data[-1]
else:
break
return data

I've split removing trailing zeros and removing trailing duplicates into two functions. Using the list[-n] indices to avoid explicit index tracking.
In [20]: def remove_trailing_duplicates(dat):
...: key=dat[-1][1]
...: while (len(dat)>1) and (dat[-2][1]==key):
...: dat.pop() # Remove the last item.
...: key=dat[-1][1] # Reset key to last item.
In [21]: def remove_trailing_zeros(dat):
# len(dat)>0 can give an empty list, >1 leaves at least the first item
...: while len(dat)>0 and dat[-1][1]==0:
dat.pop()
In [22]: data = [
...: [1, 0],
...: [2, 0],
...: [3, 1],
...: [4, 1],
...: [5, 1],
...: [6, 0],
...: [7, 0]]
In [23]: remove_trailing_zeros(data)
In [24]: data
Out [24]: [[1, 0], [2, 0], [3, 1], [4, 1], [5, 1]]
In [25]: remove_trailing_duplicates(data)
In [26]: data
Out[26]: [[1, 0], [2, 0], [3, 1]]
This works with the data you used in the question and checks for only one item left in the duplicates function. What would you want for ALL data items being [n, 0]? An empty list or the first item remaining?
HTH

Related

Order a nested list based on custom condition

I have a list of lists:
[[1, 0], [2, 1], [5, 4], [1, 3], [4, 1], [3, 2], [0, <NA>]]
The list having <NA> as second element will always be the first list in nested list.
In the two subsequent lists, the first element of first list should match with second element of second list. e.g.: [0, <NA>], [1, 0], [2, 1]
The resultant list should cover all the elements from original list.
Expected output:
[[0, <NA>], [1, 0], [2, 1], [3, 2], [1, 3], [4, 1], [5, 4]]
Here, after [1, 0], we could have gone to [4, 1] as well; but that would be wrong since we won't be able to cover all the elements in the original list. I am using Python as programming language here. Any help would be appreciated. Please and thanks.
(Swapping your <NA> for a None) this looks for the longest path through the list that visits all elements exactly once.
def sort_path(elements):
def f(prefix, seq):
# get the current element to match
curr = prefix[-1][0] if len(prefix) > 0 else None
# get possible next nodes in path
next = [x for x in seq if x[1] == curr]
# get candidate paths from each next node
candidates = [f(prefix + [n], [x for x in seq if x != n]) for n in next]
# return the longest path from the candidates (or the prefix if no candidates)
return prefix if len(candidates) == 0 else max(candidates, key=len)
result = f([], elements)
return result if len(result) == len(elements) else None
input = [[1, 0], [2, 1], [5, 4], [1, 3], [4, 1], [3, 2], [0, None]]
print(sort_path(input))
# gives: [[0, None], [1, 0], [2, 1], [3, 2], [1, 3], [4, 1], [5, 4]]
This code produces your expected output using recursion.
your_list = [[1, 0], [2, 1], [5, 4], [1, 3], [4, 1], [3, 2], [0, '<NA>']]
first_item = [x for x in your_list if x[1] == '<NA>'][0] # Assuming only one of '<NA>' exists
remaining_list = your_list.copy()
remaining_list.remove(first_item)
def get_custom_order(remaining_list, ordered_list):
work_copy = remaining_list.copy()
start_value = ordered_list[-1][0]
for item in remaining_list:
if item[1] == start_value:
ordered_list.append(item)
work_copy.remove(item)
get_custom_order(work_copy, ordered_list)
break
return ordered_list
ordered_list = get_custom_order(remaining_list, [first_item])
print(ordered_list)
However, my answer is incomplete. This code only works because of the sorting of your list. It does not fulfill your requirement to cover sorting of all elements. I'll try to fix that and update my answer.

How to find a list of all possible decompositions of a list into chunks of size bigger than 2

For a given N, I have a list of all numbers from 0 to N-1
A = list(range(0,N));
and I want to find a list of all possible decompositions into lists of sizes two or higher, without repetitions. For example, for N=4 I have
A = [0,1,2,3];
and the output I want is
OUT = [[[0, 1, 2, 3]], [[0, 1], [2, 3]], [[0, 2], [1, 3]], [[0, 3], [1, 2]]];
Up to N=5, the decomposition of the initial list into only two pieces (of length 2 and 3) makes the problem very easy. However, I can't find a way to do it for higher N, since the lists of length four must be further split into two lists of length 2.
Does anyone have any suggestions on how to solve this? I feel there must be a straightforward recursive trick to do this, but I have been trying for a day now, and I am a little stuck!
Thanks!
PS: the results for N smaller than 6 are:
N=1) OUT = [[[0]]];
N=2) OUT = [[[0, 1]]];
N=3) OUT = [[[0, 1, 2]]];
N=4) OUT = [[[0, 1, 2, 3]], [[0, 1], [2, 3]], [[0, 2], [1, 3]], [[0, 3], [1, 2]]];
N=5) OUT = [[[0, 1, 2, 3, 4]], [[0, 1], [2, 3, 4]], [[0, 2], [1, 3, 4]], [[0, 3], [1, 2, 4]], [[0, 4], [1, 2, 3]], [[1, 2], [0, 3, 4]], [[1, 3], [0, 2, 4]], [[1, 4], [0, 2, 3]], [[2, 3], [0, 1, 4]], [[2, 4], [0, 1, 3]], [[3, 4], [0, 1, 2]]];
PPS: I am a physicist and haven't been programming for a while; the code probably looks terrible, and the problem might be very easy... sorry for that!
Consider the last item i.e N-1, suppose we have two sets of combinations. one for list(range(0,N-1)) and one for list(range(0,N-2)). Now, if you want to put the last item in these combinations, you should have a different approach for each one of them, which I've explained below:
Combinations of list(range(0, N-1)): To put the last item in these combinations, you have no choice other than to put it in one of the sets that are already available to understand this better consider that you have all combinations for four and now you want to add 5 to this combination. So you have :
[[[0, 1, 2, 3]], [[0, 1], [2, 3]], [[0, 2], [1, 3]], [[0, 3], [1, 2]]]
Adding last item( i.e 4) to these combination would get us something like bellow:
[[[0, 1, 2, 3, 4]], [[0, 1, 4], [2, 3]], [[0, 1], [2, 3, 4]], [[0, 2, 4], [1, 3]], [[0, 2], [1, 3, 4]], [[0, 3, 4], [1, 2]], [[0, 3], [1, 2, 4]]]
We put 4 in every combination and make new combinations out of them. So for this part, we need a recursive call for N-1. As you can see in this situation, each sets that N-1 is in has at least three items. That's because we expanded our set, which already existed and had at least two items.
Combinations of N-2 items: I've considered this for when N-1 is in a set with only two items. To find these combinations, we need to select one of the rest N-1 items and consider that selected item with item N-1 as one set and find every other combination for the rest of N-2 items. To give you an example again, consider N = 5 so the last item is 4. If we want to pair the last item with 3, we could build all combinations for (0, 1, 2) and put pair of the (3, 4) to the mix. It would be something like this for N=5:
[[[0 , 1 , 2] , [3 , 4]] , [[0 , 1 , 3] , [2 , 4]] , [[0 , 2 , 3] , [1 , 4]] , [[ 1 , 2 , 3] , [0 , 4]]]
I've implemented this using recursive functions in python. I'm not a python developer, so there may be some enhancements in implementations, but the algorithm is working fine:
import copy
def recursive_builder(arr):
if len(arr) == 3:
return [[[arr[0], arr[1], arr[2]]]]
if len(arr) == 4:
return [[[arr[0], arr[1], arr[2], arr[3]]], [[arr[0], arr[1]], [arr[2], arr[3]]], [[arr[0], arr[2]], [arr[1], arr[3]]], [[arr[0], arr[3]], [arr[1], arr[2]]]]
temp_array = arr[0:len(arr)-1]
recursive_builder_one_step_before = recursive_builder(temp_array)
new_from_one_step_before = []
last_item = arr[len(arr)-1]
for item in recursive_builder_one_step_before:
for i in range(0 , len(item)):
temp_item = copy.deepcopy(item)
temp_item[i].append(last_item)
new_from_one_step_before.append(temp_item)
new_from_two_step_before = []
for i in range(0 , len(temp_array)):
new_arr = temp_array[:i] + temp_array[i+1 :]
recursive_builder_two_step_before = recursive_builder(new_arr)
new_from_two_step_before_inner = []
for item in recursive_builder_two_step_before:
new_item = item + [[temp_array[i] , last_item]]
new_from_two_step_before_inner.append(new_item)
new_from_two_step_before = new_from_two_step_before + new_from_two_step_before_inner
return new_from_two_step_before + new_from_one_step_before
N=6
recursive_builder(list(range(0,N)))
You could run this code on Colab
Edit: I've added memorization to improve the performance a little bit, but my build_from_memory is not O(1), so the improvement could be much better if I could improve the performance of that function.
memotization_dict = {3 : [[[0, 1, 2]]] , 4 : [[[0, 1, 2, 3]], [[0, 1], [2, 3]], [[0, 2], [1, 3]], [[0, 3], [1, 2]]] }
def build_from_memory(arr):
memory = memotization_dict[len(arr)]
ret_val = []
for item in memory:
l2 = []
for i in item:
l1 = []
for j in i:
l1.append(arr[j])
l2.append(l1)
ret_val.append(l2)
return ret_val
def recursive_builder(arr):
if len(arr) in memotization_dict:
return build_from_memory(arr)
temp_array = arr[0:len(arr)-1]
recursive_builder_one_step_before = recursive_builder(temp_array)
new_from_one_step_before = []
last_item = arr[len(arr)-1]
for item in recursive_builder_one_step_before:
for i in range(0 , len(item)):
temp_item = copy.deepcopy(item)
temp_item[i].append(last_item)
new_from_one_step_before.append(temp_item)
new_from_two_step_before = []
for i in range(0 , len(temp_array)):
new_arr = temp_array[:i] + temp_array[i+1 :]
recursive_builder_two_step_before = recursive_builder(new_arr)
new_from_two_step_before_inner = []
for item in recursive_builder_two_step_before:
new_item = item + [[temp_array[i] , last_item]]
new_from_two_step_before_inner.append(new_item)
new_from_two_step_before = new_from_two_step_before + new_from_two_step_before_inner
if(arr == list(range(0 , len(arr)))):
memotization_dict[len(arr)] = new_from_two_step_before + new_from_one_step_before
return new_from_two_step_before + new_from_one_step_before

Find all possible replacements for 1s and 0s in a list

Input:
table = [
[1, ''],
['', 0]
]
Desired output:
[[[1, 0], [0, 0]], [[1, 0], [1, 0]], [[1, 1], [0, 0]], [[1, 1], [1, 0]]]
Tried to do it with recursion like this:
def get_tables(tab, out_tab):
changes = 0
for row_i, row in enumerate(tab):
for val_i, val in enumerate(row):
if val == '':
changes = 1
tab[row_i][val_i] = 0
get_tables(tab, out_tab)
tab[row_i][val_i] = 1
get_tables(tab, out_tab)
if changes == 0:
out_tab.append(tab)
But it returns only 1s
Edit:
So the logic is: find '' in list and then create two lists, one with 1 instead of '' and one with 0 (like [1, ''] to [1, 0] and [1, 1]). Do the same operation with those two lists, and continue until there's no ''. Return all this lists in one.
Not the most efficient solution, but it works:
table = [
[1, ''],
['', 0]
]
import copy
# locations where element is ''
locs = [[idx_r, idx_c] for idx_r, row in enumerate(table) for idx_c, element in enumerate(row) if element == '']
len_locs = len(locs)
tables = []
for i in range(2 ** len_locs): # for n locations, we have 2**n permutations of 1 and 0
table_copy = copy.deepcopy(table)
permutation = list(map(int, f'{i:0{len_locs}b}')) # convert i to the binary to get current permutation
for it, (idx_r, idx_c) in enumerate(locs):
table_copy[idx_r][idx_c] = permutation[it]
tables.append(table_copy)
print(tables)
>>> [[[1, 0], [0, 0]], [[1, 0], [1, 0]], [[1, 1], [0, 0]], [[1, 1], [1, 0]]]

Why this parameter list doesn't change in Python?

I created a function f which uses a 2-dimension list as parameter, but after this function the list does not change at all.
As the code below:
def f(t: [[int]]):
for eachrow in t:
eachrow = eachrow[1:]
eachrow.append(0)
A = [[2, 10, 0], [3, 1, 2], [3, 2, 1]]
f(A)
print(A) # -> [[2, 10, 0], [3, 1, 2], [3, 2, 1]]
Assigning to eachrow in eachrow = eachrow[1:] overwrites it. So to remove the first element, you could use del instead, or row.pop or slice assignment.
def f(t):
for row in t:
del row[0] # OR row.pop(0) OR row[:] = row[1:]
row.append(0)
A = [[2, 10, 0], [3, 1, 2], [3, 2, 1]]
f(A)
print(A) # -> [[10, 0, 0], [1, 2, 0], [2, 1, 0]]
If you print out the results of your changes to eachrow, you'll see that you ARE updating the eachrow variable, but that doesn't affet the original t variable.
def f(t):
for eachrow in t:
eachrow = eachrow[1:]
eachrow.append(0)
print(eachrow)
>>> f(A)
[10, 0, 0]
[1, 2, 0]
[2, 1, 0]
If you want to affect the array itself, you should modify the array like so:
def f(t):
for row_number in range(len(t)):
t[row_number] = t[row_number][1:]
t[row_number].append(0)

cumulative argmax of a numpy array

Consider the array a
np.random.seed([3,1415])
a = np.random.randint(0, 10, (10, 2))
a
array([[0, 2],
[7, 3],
[8, 7],
[0, 6],
[8, 6],
[0, 2],
[0, 4],
[9, 7],
[3, 2],
[4, 3]])
What is a vectorized way to get the cumulative argmax?
array([[0, 0], <-- both start off as max position
[1, 1], <-- 7 > 0 so 1st col = 1, 3 > 2 2nd col = 1
[2, 2], <-- 8 > 7 1st col = 2, 7 > 3 2nd col = 2
[2, 2], <-- 0 < 8 1st col stays the same, 6 < 7 2nd col stays the same
[2, 2],
[2, 2],
[2, 2],
[7, 2], <-- 9 is new max of 2nd col, argmax is now 7
[7, 2],
[7, 2]])
Here is a non-vectorized way to do it.
Notice that as the window expands, argmax applies to the growing window.
pd.DataFrame(a).expanding().apply(np.argmax).astype(int).values
array([[0, 0],
[1, 1],
[2, 2],
[2, 2],
[2, 2],
[2, 2],
[2, 2],
[7, 2],
[7, 2],
[7, 2]])
Here's a vectorized pure NumPy solution that performs pretty snappily:
def cumargmax(a):
m = np.maximum.accumulate(a)
x = np.repeat(np.arange(a.shape[0])[:, None], a.shape[1], axis=1)
x[1:] *= m[:-1] < m[1:]
np.maximum.accumulate(x, axis=0, out=x)
return x
Then we have:
>>> cumargmax(a)
array([[0, 0],
[1, 1],
[2, 2],
[2, 2],
[2, 2],
[2, 2],
[2, 2],
[7, 2],
[7, 2],
[7, 2]])
Some quick testing on arrays with thousands to millions of values suggests that this is anywhere between 10-50 times faster than looping at the Python level (either implicitly or explicitly).
I cant think of a way to vectorize this over both columns easily; but if the number of columns is small relative to the number of rows, that shouldn't be an issue and a for loop should suffice for that axis:
import numpy as np
import numpy_indexed as npi
a = np.random.randint(0, 10, (10))
max = np.maximum.accumulate(a)
idx = npi.indices(a, max)
print(idx)
I would like to make a function that computes cumulative argmax for 1d array and then apply it to all columns. This is the code:
import numpy as np
np.random.seed([3,1415])
a = np.random.randint(0, 10, (10, 2))
def cumargmax(v):
uargmax = np.frompyfunc(lambda i, j: j if v[j] > v[i] else i, 2, 1)
return uargmax.accumulate(np.arange(0, len(v)), 0, dtype=np.object).astype(v.dtype)
np.apply_along_axis(cumargmax, 0, a)
The reason for converting to np.object and then converting back is a workaround for Numpy 1.9, as mentioned in generalized cumulative functions in NumPy/SciPy?

Categories