Comparing sublists and merging them

Comparing sublists and merging them - python

I have a list that contains a lot of sublists, which are initially pairs of numbers, so it looks like:
list = [[2, 3], [4, 5], [7, 8], [8, 9], [11, 12], [14, 15], [15, 16], [16, 17], [17, 18], [18, 19], [20, 21]]
and what I want is to compare last digit of sublist with first digit in next sublist and if they match - merge them in one sublist. So output for two matching sublists would be something like that:
output = [[7, 8, 9]]
And, of course, if there is a row of matching sublists then to merge them all in one big sublist.
output = [[14, 15, 16, 17, 18, 19]]
I was thinking about using itemgetter as a kind of a key to compare. So probably something like:
prev_digit = itemgetter(-1)
next_digit = itemgetter(0)
but then initially I realised that I don't really understand how can I use it in Python, due to lack of knowledge. I tried to think of a for loop, but it didn't work out as I didn't know how to implement those "keys".
For some kind of inspiration I used this Python, comparison sublists and making a list but even with that I still have no solution.
Also, as my list can get kinda big (from human perspective, so like thousand pairs or stuff) I'm very interested in the most efficient way to do this.
And yes, I'm new to Python, so I would be very grateful for good explanation. Of course I can google, so you can avoid explaining functions in depth, but like general logic would be nice.

I think I wrote this once. It can be done with a single pass over the list.
alist = [[2, 3], [4, 5], [7, 8], [8, 9], [11, 12], [14, 15], [15, 16], [16, 17], [17, 18], [18, 19], [20, 21]]
l = [alist[0][:]]
for e in alist[1:]:
if l[-1][-1] == e[0]:
l[-1].append(e[1])
else:
l.append(e[:])
The code reads as start with the first pair. Loop over the rest. Check if the last element of the last list is the same as the first element of the pair. If so append the second element else append the pair to the list.
This results in l being:
[[2, 3], [4, 5], [7, 8, 9], [11, 12], [14, 15, 16, 17, 18, 19], [20, 21]]
If you only want the largest sublist I suggest:
>>> l = [[2, 3], [4, 5], [7, 8, 9], [11, 12], [14, 15, 16, 17, 18, 19], [20, 21]]
>>> max(l, key=len)
[14, 15, 16, 17, 18, 19]
And evaluated:
>>> alist = [[2, 3], [4, 5], [7, 8], [8, 9], [11, 12], [14, 15], [15, 16], [16, 17], [17, 18], [18, 19], [20, 21]]
>>>
>>> l = [alist[0][:]]
>>> for e in alist[1:]:
... if l[-1][-1] == e[0]:
... l[-1].append(e[1])
... else:
... l.append(e[:])
...
>>> l
[[2, 3], [4, 5], [7, 8, 9], [11, 12], [14, 15, 16, 17, 18, 19], [20, 21]]
>>> alist
[[2, 3], [4, 5], [7, 8], [8, 9], [11, 12], [14, 15], [15, 16], [16, 17], [17, 18], [18, 19], [20, 21]]
And compared. The reduce solution takes 6.4 usecs:
$ python -mtimeit "list = [[2, 3], [4, 5], [7, 8], [8, 9], [11, 12], [14, 15], [15, 16], [16, 17], [17, 18], [18, 19], [20, 21]]" "reduce(lambda x,y: x[:-1] + [x[-1] + y[1:]] if x[-1][-1] == y[0] else x + [y], list[1:], [list[0]])"
100000 loops, best of 3: 6.4 usec per loop
The for loop takes 3.62 usecs:
$ python -mtimeit "alist = [[2, 3], [4, 5], [7, 8], [8, 9], [11, 12], [14, 15], [15, 16], [16, 17], [17, 18], [18, 19], [20, 21]]" "l = [alist[0][:]]" "for e in alist[1:]:" " if l[-1][-1] == e[0]:" " l[-1].append(e[1])" " else:" " l.append(e[:])"
100000 loops, best of 3: 3.62 usec per loop
On Python 2.7.3. The for loop is 56% faster. The difference would likely be more pronounced with larger inputs as the cost of a list concatenation depends on the sum of the length of the two lists. Whereas appending to a list is slightly cheaper.

using reduce
>>> reduce(
... lambda x,y: x[:-1] + [x[-1] + y[1:]] if x[-1][-1] == y[0] else x + [y],
... list[1:],
... [list[0]]
... )
[[2, 3], [4, 5], [7, 8, 9], [11, 12], [14, 15, 16, 17, 18, 19], [20, 21]]
Explanation
Here is the lamdba function in expanded form used with reduce.
def mergeOverlappingRange(x, y):
if x[-1][-1] == y[0]:
return x[:-1] + [x[-1] + y[1:]]
else:
return x + [y]
reduce(mergeOverlappingRange, list[1:], [list[0]])

Related

Using numpy.delete() or any other function to delete from a list of lists/arrays

I have the following list, let's call it R:
[(array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]),
array([100, 101, 102])),
(array([[10, 11, 12],
[13, 14, 15],
[16, 17, 18]]),
array([103, 104, 105]))]
I want to be able to delete columns of R in a for loop, based on an index i. For example, if i = 3, the 3rd column should be deleted, which should result in the following new, say R1:
[(array([[1, 2],
[4, 5],
[7, 8]]),
array([100, 101])),
(array([[10, 11],
[13, 14],
[16, 17]]),
array([103, 104]))]
I have zero experience with handling such multi dimensional arrays, so I am unsure how to use numpy.delete(). My actual list R is pretty big, so I would appreciate if someone can suggest how to go about the loop.

You can use np.delete with col==2 and axis=-1.
# if your 'list' be like below as you say in the question :
print(lst)
# [
# array([[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]]),
# array([100, 101, 102]),
# array([[10, 11, 12],
# [13, 14, 15],
# [16, 17, 18]]),
# array([103, 104, 105])
# ]
for idx, l in enumerate(lst):
lst[idx] = np.delete(l, 2, axis=-1)
print(lst)
Output:
[
array([[1, 2],
[4, 5],
[7, 8]]),
array([100, 101]),
array([[10, 11],
[13, 14],
[16, 17]]),
array([103, 104])
]
Creating input array like in the question:
import numpy as np
lst = [[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
[100, 101, 102],
[[10, 11, 12],
[13, 14, 15],
[16, 17, 18]],
[103, 104, 105]
]
lst = [np.array(l) for l in lst]
Update base comment, If you have a tuple of np.array in your list, you can try like below:
lst = [
(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), np.array([100, 101, 102])),
(np.array([[10, 11, 12], [13, 14, 15], [16, 17, 18]]), np.array([103, 104, 105]))
]
for idx, tpl in enumerate(lst):
lst[idx] = tuple(np.delete(l, 2, axis=-1) for l in tpl)
print(lst)
Output:
[
(array([[1, 2],
[4, 5],
[7, 8]]),
array([100, 101])
),
(array([[10, 11],
[13, 14],
[16, 17]]),
array([103, 104]))
]

Why is the value of the entire column changing after each iteration instead of just a particular cell? [duplicate]

This question already has answers here:
List of lists changes reflected across sublists unexpectedly
(17 answers)
Closed 2 years ago.
I am trying to convert a python list into a 2d matrix without numpy library.
However, in doing so, my code is not working as expected.
n=3
arr = [16,13,4,16,13,19,23,8,11]
mat=[[0]*n]*n #initialising a 3*3 matrix
for a in range(n):
for b in range(n):
mat[a][b]=arr[b]
print(mat) #To understand the result of each iteration
if len(arr) >n:
arr=arr[n:]
print(mat)
The obtained output is:
[[16, 0, 0], [16, 0, 0], [16, 0, 0]]
[[16, 13, 0], [16, 13, 0], [16, 13, 0]]
[[16, 13, 4], [16, 13, 4], [16, 13, 4]]
[[16, 13, 4], [16, 13, 4], [16, 13, 4]]
[[16, 13, 4], [16, 13, 4], [16, 13, 4]]
[[16, 13, 19], [16, 13, 19], [16, 13, 19]]
[[23, 13, 19], [23, 13, 19], [23, 13, 19]]
[[23, 8, 19], [23, 8, 19], [23, 8, 19]]
[[23, 8, 11], [23, 8, 11], [23, 8, 11]]
finally:
[[23, 8, 11], [23, 8, 11], [23, 8, 11]]
instead of:
[[16,13,4], [16,13,19], [23, 8, 11]]
Why is the value of the entire column changing after each iteration instead of just that cell?
What possible change in the code can give the desired result(i.e- converting the list of 9 elements to a 3x3 matrix)?

you can use slicing to do that.
n=3
twoD_array=[]
arr = [16,13,4,16,13,19,23,8,11]
for i in range(0,len(arr),n):
twoD_array.append(arr[i:i+n])
print(twoD_array)

How can I group a sorted pandas.Series?

Given a sorted pandas.Series (or just a list) object I want to create groups (e.g., lists or pandas.Series) such that the difference between adjacent elements in the group is less than some threshold, e.g.:
THRESHOLD = 2
sorted_list = [1, 2, 10, 15, 16, 17, 20, 21]
# ...
result = [[1, 2], [10], [15, 16, 17], [20, 21]]

You can use diff and cumsum to mark groups, then use groupby:
s = pd.Series(sorted_list)
s.groupby(s.diff().gt(THRESHOLD).cumsum()).apply(list).tolist()
# [[1, 2], [10], [15, 16, 17], [20, 21]]

Using
s = pd.Series(sorted_list)
[y.tolist() for x , y in s.groupby(s.diff().gt(THRESHOLD).cumsum())]
Out[167]: [[1, 2], [10], [15, 16, 17], [20, 21]]

Generating list of lists with custom value limitations with Hypothesis

The Story:
Currently, I have a function-under-test that expects a list of lists of integers with the following rules:
number of sublists (let's call it N) can be from 1 to 50
number of values inside sublists is the same for all sublists (rectangular form) and should be >= 0 and <= 5
values inside sublists cannot be more than or equal to the total number of sublists. In other words, each value inside a sublist is an integer >= 0 and < N
Sample valid inputs:
[[0]]
[[2, 1], [2, 0], [3, 1], [1, 0]]
[[1], [0]]
Sample invalid inputs:
[[2]] # 2 is more than N=1 (total number of sublists)
[[0, 1], [2, 0]] # 2 is equal to N=2 (total number of sublists)
I'm trying to approach it with property-based-testing and generate different valid inputs with hypothesis library and trying to wrap my head around lists() and integers(), but cannot make it work:
the condition #1 is easy to approach with lists() and min_size and max_size arguments
the condition #2 is covered under Chaining strategies together
the condition #3 is what I'm struggling with - cause, if we use the rectangle_lists from the above example, we don't have a reference to the length of the "parent" list inside integers()
The Question:
How can I limit the integer values inside sublists to be less than the total number of sublists?
Some of my attempts:
from hypothesis import given
from hypothesis.strategies import lists, integers
#given(lists(lists(integers(min_value=0, max_value=5), min_size=1, max_size=5), min_size=1, max_size=50))
def test(l):
# ...
This one was very far from meeting the requirements - list is not strictly of a rectangular form and generated integer values can go over the generated size of the list.
from hypothesis import given
from hypothesis.strategies import lists, integers
#given(integers(min_value=0, max_value=5).flatmap(lambda n: lists(lists(integers(min_value=1, max_value=5), min_size=n, max_size=n), min_size=1, max_size=50)))
def test(l):
# ...
Here, the #1 and #2 are requirements were being met, but the integer values can go larger than the size of the list - requirement #3 is not met.

There's a good general technique that is often useful when trying to solve tricky constraints like this: try to build something that looks a bit like what you want but doesn't satisfy all the constraints and then compose it with a function that modifies it (e.g. by throwing away the bad bits or patching up bits that don't quite work) to make it satisfy the constraints.
For your case, you could do something like the following:
from hypothesis.strategies import builds, lists, integers
def prune_list(ls):
n = len(ls)
return [
[i for i in sublist if i < n][:5]
for sublist in ls
]
limited_list_strategy = builds(
prune_list,
lists(lists(integers(0, 49), average_size=5), max_size=50, min_size=1)
)
In this we:
Generate a list that looks roughly right (it's a list of list of integers and the integers are in the same range as all possible indices that could be valid).
Prune out any invalid indices from the sublists
Truncate any sublists that still have more than 5 elements in them
The result should satisfy all three conditions you needed.
The average_size parameter isn't strictly necessary but in experimenting with this I found it was a bit too prone to producing empty sublists otherwise.
ETA: Apologies. I've just realised that I misread one of your conditions - this doesn't actually do quite what you want because it doesn't ensure each list is the same length. Here's a way to modify this to fix that (it gets a bit more complicated, so I've switched to using composite instead of builds):
from hypothesis.strategies import composite, lists, integers, permutations
#composite
def limisted_lists(draw):
ls = draw(
lists(lists(integers(0, 49), average_size=5), max_size=50, min_size=1)
)
filler = draw(permutations(range(50)))
sublist_length = draw(integers(0, 5))
n = len(ls)
pruned = [
[i for i in sublist if i < n][:sublist_length]
for sublist in ls
]
for sublist in pruned:
for i in filler:
if len(sublist) == sublist_length:
break
elif i < n:
sublist.append(i)
return pruned
The idea is that we generate a "filler" list that provides the defaults for what a sublist looks like (so they will tend to shrink in the direction of being more similar to eachother) and then draw the length of the sublists to prune to to get that consistency.
This has got pretty complicated I admit. You might want to use RecursivelyIronic's flatmap based version. The main reason I prefer this over that is that it will tend to shrink better, so you'll get nicer examples out of it.

You can also do this with flatmap, though it's a bit of a contortion.
from hypothesis import strategies as st
from hypothesis import given, settings
number_of_lists = st.integers(min_value=1, max_value=50)
list_lengths = st.integers(min_value=0, max_value=5)
def build_strategy(number_and_length):
number, length = number_and_length
list_elements = st.integers(min_value=0, max_value=number - 1)
return st.lists(
st.lists(list_elements, min_size=length, max_size=length),
min_size=number, max_size=number)
mystrategy = st.tuples(number_of_lists, list_lengths).flatmap(build_strategy)
#settings(max_examples=5000)
#given(mystrategy)
def test_constraints(list_of_lists):
N = len(list_of_lists)
# condition 1
assert 1 <= N <= 50
# Condition 2
[length] = set(map(len, list_of_lists))
assert 0 <= length <= 5
# Condition 3
assert all((0 <= element < N) for lst in list_of_lists for element in lst)
As David mentioned, this does tend to produce a lot of empty lists, so some average size tuning would be required.
>>> mystrategy.example()
[[24, 6, 4, 19], [26, 9, 15, 15], [1, 2, 25, 4], [12, 8, 18, 19], [12, 15, 2, 31], [3, 8, 17, 2], [5, 1, 1, 5], [7, 1, 16, 8], [9, 9, 6, 4], [22, 24, 28, 16], [18, 11, 20, 21], [16, 23, 30, 5], [13, 1, 16, 16], [24, 23, 16, 32], [13, 30, 10, 1], [7, 5, 14, 31], [31, 15, 23, 18], [3, 0, 13, 9], [32, 26, 22, 23], [4, 11, 20, 10], [6, 15, 32, 22], [32, 19, 1, 31], [20, 28, 4, 21], [18, 29, 0, 8], [6, 9, 24, 3], [20, 17, 31, 8], [6, 12, 8, 22], [32, 22, 9, 4], [16, 27, 29, 9], [21, 15, 30, 5], [19, 10, 20, 21], [31, 13, 0, 21], [16, 9, 8, 29]]
>>> mystrategy.example()
[[28, 18], [17, 25], [26, 27], [20, 6], [15, 10], [1, 21], [23, 15], [7, 5], [9, 3], [8, 3], [3, 4], [19, 29], [18, 11], [6, 6], [8, 19], [14, 7], [25, 3], [26, 11], [24, 20], [22, 2], [19, 12], [19, 27], [13, 20], [16, 5], [6, 2], [4, 18], [10, 2], [26, 16], [24, 24], [11, 26]]
>>> mystrategy.example()
[[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]
>>> mystrategy.example()
[[], [], [], [], [], [], [], [], [], [], [], [], [], [], []]
>>> mystrategy.example()
[[6, 8, 22, 21, 22], [3, 0, 24, 5, 18], [16, 17, 25, 16, 11], [2, 12, 0, 3, 15], [0, 12, 12, 12, 14], [11, 20, 6, 6, 23], [5, 19, 2, 0, 12], [16, 0, 1, 24, 10], [2, 13, 21, 19, 15], [2, 14, 27, 6, 7], [22, 25, 18, 24, 9], [26, 21, 15, 18, 17], [7, 11, 22, 17, 21], [3, 11, 3, 20, 16], [22, 13, 18, 21, 11], [4, 27, 21, 20, 25], [4, 1, 13, 5, 13], [16, 19, 6, 6, 25], [19, 10, 14, 12, 14], [18, 13, 13, 16, 3], [12, 7, 26, 26, 12], [25, 21, 12, 23, 22], [11, 4, 24, 5, 27], [25, 10, 10, 26, 27], [8, 25, 20, 6, 23], [8, 0, 12, 26, 14], [7, 11, 6, 27, 26], [6, 24, 22, 23, 19]]

Pretty late, but for posterity: the easiest solution is to pick dimensions, then build up from the element strategy.
from hypothesis.strategies import composite, integers, lists
#composite
def complicated_rectangles(draw, max_N):
list_len = draw(integers(1, max_N))
sublist_len = draw(integers(0, 5))
element_strat = integers(0, min(list_len, 5))
sublist_strat = lists(
element_strat, min_size=sublist_len, max_size=sublist_len)
return draw(lists(
sublist_strat, min_size=list_len, max_size=list_len))

How to select a submatrix from an adjacency list in Python?

I have an adjacency list where each array represents non-zero columns at that row (e.g. 0th array in the adj. list below means columns 2 and 6 are 1, and everything else is 0).
adj_list = [[2, 6], [1, 3, 24], [2, 4], [3, 5, 21], [4, 6, 10], [1, 5,
7], [6, 8, 9], [7], [7, 10, 14], [5, 9, 11], [10, 12, 18], [11, 13],
[12, 14, 15], [9, 13], [13, 16, 17], [15], [15], [11, 19, 20], [18],
[18], [4, 22, 23], [21], [21], [2, 25, 26], [24], [24]]
Given this adj. list, I would like to select a submatrix which has identical row and column indices that is given by:
submatrix = (0, 1, 2, 5, 22)
Each element in submatrix indicates a row number.
1) For each row i in submatrix, I need to get ith array from adj_list (which is equivalent to getting ith row from an adjacency matrix)
2) Then from that array, I need to extract the items that match with submatrix
For example, if I am currently looking at 3rd element in submatrix, which is 5, then I need to go to 5th array in adj_list (equivalent of getting 5th row of adj.matrix), which is [1,5,7], and then I need to look which elements in [1,5,7] matches with submatrix (equivalent of getting 1th, 5th and 7th columns of the 5th row). In this case, the result for 5th row should be [0,1,0,1,0] because only 1 and 5 are intersected in two arrays).
How can I efficiently select this submatrix given the adj. list?

adj_list = [[2, 6], [1, 3, 24], [2, 4], [3, 5, 21], [4, 6, 10], [1, 5, 7], [6, 8, 9], [7], [7, 10, 14], [5, 9, 11], [10, 12, 18], [11, 13], [12, 14, 15], [9, 13], [13, 16, 17], [15], [15], [11, 19, 20], [18], [18], [4, 22, 23], [21], [21], [2, 25, 26], [24], [24]]
submatrix = (0, 1, 2, 5, 22)
result = [[i in adj_list[sm] for i in submatrix] for sm in submatrix]
This should do it; though I suspect you might prefer to compute something other than this if you consider your end goal more carefully.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Comparing sublists and merging them - python

Related

Using numpy.delete() or any other function to delete from a list of lists/arrays

Why is the value of the entire column changing after each iteration instead of just a particular cell? [duplicate]

How can I group a sorted pandas.Series?

Generating list of lists with custom value limitations with Hypothesis

How to select a submatrix from an adjacency list in Python?

Categories

Resources