Merge two list of tuples based on overlapp python - python

Given two lists x, y such that they both have been initialized as shown below:
x = [(0, 3), (5, 8), (16, 19), (21, 24), (28, 30), (40, 42), (46, 47), (50, 54), (58, 63), (69, 71)]
y = [(9, 10), (26, 27), (29, 31), (35, 36), (41, 43), (48, 49), (66, 67), (70, 72), (77, 78), (85, 86)]
I want to form a new list of tuples where each tuple has contiguous tuples from x and an overlapping tuple from y.
For the example above, the output would be:
[((5, 8) (9, 10) (16, 19)), ((21, 24) (26, 27) (28, 30)), ((28, 30) (29, 31) (40, 42)), ((28, 30) (35, 36) (40, 42)), ((40, 42) (41, 43) (46, 47)), ((46, 47) (48, 49) (50, 54)),((58, 63) (66, 67) (69, 71))]
My code:
lst = []
for i in range(len(x)):
if i+1 < len(x):
context = x[i],x[i+1]
for j in y:
if j[0] >= context[0][0] and j[0] <= context[1][0]:
lst.append((context[0],j,context[1]))
I need better and efficient ways to write this code.

You can use two variables to keep track of indices in x and y list. Using the conditions specified in the problem, these indices can be incremented whenever the given condition has been satisfied.
At every iteration, the algorithm checks if x[i][0] < y[j][0] and x[i+1][1] > y[j][1] ( The upper and lower bound provided by the contigous tuples in x. If this condition is true, we increment j (y-index) so that we can check if the next element lies in the given range. Else, we increment i (x-index) and repeat the process.
x = [(0, 3), (5, 8), (16, 19), (21, 24), (28, 30), (40, 42), (46, 47), (50, 54), (58, 63), (69, 71)]
y = [(9, 10), (26, 27), (29, 31), (35, 36), (41, 43), (48, 49), (66, 67), (70, 72), (77, 78), (85, 86)]
i = 0
j = 0
result = list()
while i < len(x) - 1 and j < len(y):
if y[j][0] > x[i][0] and y[j][1] < x[i + 1][1]:
result.append((x[i], y[j], x[i + 1]))
j += 1
else:
i += 1
print(result)
Output -
[((5, 8), (9, 10), (16, 19)),
((21, 24), (26, 27), (28, 30)),
((28, 30), (29, 31), (40, 42)),
((28, 30), (35, 36), (40, 42)),
((40, 42), (41, 43), (46, 47)),
((46, 47), (48, 49), (50, 54)),
((58, 63), (66, 67), (69, 71))]

You can use Python Sorting
from operator import itemgetter, attrgetter
output = sorted((x + y), key=itemgetter(0))

Related

How to create a list from a dict of lists that has combinations of the array elements [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
The question might have been worded confusingly so here I will try to make it more clear. Suppose I have a dict like x = {0: [(36, 44)], 1: [(38, 39), (38, 40), (39, 40)], 2: [(37, 41), (37, 42), (41, 42)], 3: [(43,)], 4: [(45,)]}
I want to create a list with all possible combinations of 1 element from each of the array. E.g. - [(36,44),(38,39),(37,41),(43),(45)] is one such combination, [(36,44),(38,40),(37,41),(43),(45)] is another. The final list should have all such combinations.
For now as I know the key values, I tried doing it as
arr_0 = x[0]
arr_1 = x[1]
arr_2 = x[2]
arr_3 = x[3]
arr_4 = x[4]
and then finally,
y = itertools.product(arr_0, arr_1, arr_2, arr_3, arr_4)
which produces the answer
((36, 44), (38, 39), (37, 41), (43,), (45,))
((36, 44), (38, 39), (37, 42), (43,), (45,))
((36, 44), (38, 39), (41, 42), (43,), (45,))
((36, 44), (38, 40), (37, 41), (43,), (45,))
((36, 44), (38, 40), (37, 42), (43,), (45,))
((36, 44), (38, 40), (41, 42), (43,), (45,))
((36, 44), (39, 40), (37, 41), (43,), (45,))
((36, 44), (39, 40), (37, 42), (43,), (45,))
((36, 44), (39, 40), (41, 42), (43,), (45,))
In a scenario where I do not the keys of the dict and the array sizes, how can I create the final array ?
You may use itertools.product and pass it the different arrays thta you need
from itertools import product
x = {0: [(36, 44)], 1: [(38, 39), (38, 40), (39, 40)], 2: [(37, 41), (37, 42)]}
for p in product(*x.values()):
print(p)
((36, 44), (38, 39), (37, 41))
((36, 44), (38, 39), (37, 42))
((36, 44), (38, 40), (37, 41))
((36, 44), (38, 40), (37, 42))
((36, 44), (39, 40), (37, 41))
((36, 44), (39, 40), (37, 42))
The x.values() is a list of list of values (tuples) [[(36, 44)], [(38, 39), (38, 40), (39, 40)], [(37, 41), (37, 42)]], the * is here to expand them and pass each sublist a different parameter of the method
For you to get it, a smaller example
for p in product([1, 2], [3], (5, 6)):
print(p)
(1, 3, 5)
(1, 3, 6)
(2, 3, 5)
(2, 3, 6)

How to divide a specific number in a list?

A Mersenne Prime follows this formula 2^n-1. I have created a new type of factoring method for numbers which do not produce Mersenne primes. It is very abstract. Its premise is if a specific number is applied using modular math and the new number becomes (zero), it is not a Mersenne Prime Number. I submitted a paper to The Journal of Number Theory online, however it was rejected by the journal. I have attached it if you would like to look it over, I still feel my method is promising yet I'm no coding expert. This is a pdf I sent to the Journal of Number Theory My problem is in my new code I don't know how to divide a number in the list. The list enumerates ok yet I want to subtract z=11 from 253 which equals 242 than mod it by 121, however when I create a range from 1-254 I cannot seem to do this math. The reason I'm interested in this is 253//11=23 which is a factor of 2^11-1. I got this idea from a ratio page.
Type 1:11 and the second number is a 22 just add 1 and its 23.
Check it out
https://goodcalculators.com/ratio-calculator/
The formula will target any number in the range and what I'm looking for is a zero.
Additional details for grismar as per request:
Grismar and others,
What I have found is that Mersenne primes will produce fewer zero's below the number 11 vs. a number like 2^11-1. Also when you output the number by subtraction of z and then mod z*z you may find the number with the lowest factor in it after you divide it by z. The range must be large enough as to find that number, yet if is zero simply divide by z. Then for instance when you find 23 by dividing 11 into 253. You can divide 23 into 2047 and you should get 89. More than likely if you use a different number to check this factor you will get a fraction. So when checking using this method when you find a zero for a number which does not produce a Mersenne Prime number like. Lets pick 29. 536870911 รท 233 = 2304167 so you get a factor number not a fraction.
These are all the factors of 536870911
[1, 233, 1103, 256999, 2089, 486737, 2304167, 536870911]
If you would like even more details leave a comment please.
Programmer in learning looking for help here is my program:
1 should be the start range!
while True:
x = int(input("Use 1 for the start range to make this work correctly:
"))
i = int(input("End Range: "))
z = int(input("square of primes multiplied by a number plus z which
does not make a
mersenne prime, this finds its factor of z: "))
fact = [(i + 1, x) for i, x in enumerate(range(x, i))]
print([((int(i)-z) % (z*z)) if isinstance(i, str) else i for i in fact])
Maybe what you are trying is this, the int call is unnecessary since the values are integers from the start. Also, don't use the same variable i for different purposes:
calculations = [
(index + 1, (fact_tuple[0] - z) % (z*z)) for index, fact_tuple in enumerate(fact)
]
print(calculations) # with x = 1, i = 254, z = 11
>>> [(1, 111), (2, 112), (3, 113), (4, 114), (5, 115), (6, 116), (7, 117), (8, 118), (9, 119), (10, 120), (11, 0), (12, 1), (13, 2), (14, 3), (15, 4), (16, 5), (17, 6), (18, 7), (19, 8), (20, 9), (21, 10), (22, 11), (23, 12), (24, 13), (25, 14), (26, 15), (27, 16), (28, 17), (29, 18), (30, 19), (31, 20), (32, 21), (33, 22), (34, 23), (35, 24), (36, 25), (37, 26), (38, 27), (39, 28), (40, 29), (41, 30), (42, 31), (43, 32), (44, 33), (45, 34), (46, 35), (47, 36), (48, 37), (49, 38), (50, 39), (51, 40), (52, 41), (53, 42), (54, 43), (55, 44), (56, 45), (57, 46), (58, 47), (59, 48), (60, 49), (61, 50), (62, 51), (63, 52), (64, 53), (65, 54), (66, 55), (67, 56), (68, 57), (69, 58), (70, 59), (71, 60), (72, 61), (73, 62), (74, 63), (75, 64), (76, 65), (77, 66), (78, 67), (79, 68), (80, 69), (81, 70), (82, 71), (83, 72), (84, 73), (85, 74), (86, 75), (87, 76), (88, 77), (89, 78), (90, 79), (91, 80), (92, 81), (93, 82), (94, 83), (95, 84), (96, 85), (97, 86), (98, 87), (99, 88), (100, 89), (101, 90), (102, 91), (103, 92), (104, 93), (105, 94), (106, 95), (107, 96), (108, 97), (109, 98), (110, 99), (111, 100), (112, 101), (113, 102), (114, 103), (115, 104), (116, 105), (117, 106), (118, 107), (119, 108), (120, 109), (121, 110), (122, 111), (123, 112), (124, 113), (125, 114), (126, 115), (127, 116), (128, 117), (129, 118), (130, 119), (131, 120), (132, 0), (133, 1), (134, 2), (135, 3), (136, 4), (137, 5), (138, 6), (139, 7), (140, 8), (141, 9), (142, 10), (143, 11), (144, 12), (145, 13), (146, 14), (147, 15), (148, 16), (149, 17), (150, 18), (151, 19), (152, 20), (153, 21), (154, 22), (155, 23), (156, 24), (157, 25), (158, 26), (159, 27), (160, 28), (161, 29), (162, 30), (163, 31), (164, 32), (165, 33), (166, 34), (167, 35), (168, 36), (169, 37), (170, 38), (171, 39), (172, 40), (173, 41), (174, 42), (175, 43), (176, 44), (177, 45), (178, 46), (179, 47), (180, 48), (181, 49), (182, 50), (183, 51), (184, 52), (185, 53), (186, 54), (187, 55), (188, 56), (189, 57), (190, 58), (191, 59), (192, 60), (193, 61), (194, 62), (195, 63), (196, 64), (197, 65), (198, 66), (199, 67), (200, 68), (201, 69), (202, 70), (203, 71), (204, 72), (205, 73), (206, 74), (207, 75), (208, 76), (209, 77), (210, 78), (211, 79), (212, 80), (213, 81), (214, 82), (215, 83), (216, 84), (217, 85), (218, 86), (219, 87), (220, 88), (221, 89), (222, 90), (223, 91), (224, 92), (225, 93), (226, 94), (227, 95), (228, 96), (229, 97), (230, 98), (231, 99), (232, 100), (233, 101), (234, 102), (235, 103), (236, 104), (237, 105), (238, 106), (239, 107), (240, 108), (241, 109), (242, 110), (243, 111), (244, 112), (245, 113), (246, 114), (247, 115), (248, 116), (249, 117), (250, 118), (251, 119), (252, 120), (253, 0)]

Sorting Tuples Python

I want to sort tuples using this method...
If (a1,b1) < (a2,b2) then a2>a1 or (a1==a2 and b2>b1).
The algorithm should not work in place, and it's expected that it will receive numbers in the range [0,99].
Input:
[(9, 7), (78, 24), (17, 74), (53, 81), (40, 43), (79, 82), (84, 46), (68, 53),
(92, 95), (60, 38), (20, 62), (72, 57)]
Output:
[(9, 7), (17, 74), (20, 62), (40, 43), (53, 81), (60, 38), (68, 53), (72, 57),
(78, 24), (79, 82), (84, 46), (92, 95)]
I thought of using the concept of counting sort since the time complexity has to be O(n), but then the list counter length would be 100*100. That wouldn't be a very efficient approach.
Do you have any suggestions?
sorted() built-in function should work just fine for your case, it compares the first element and if the first element is the same for two items, it then compares the 2nd element, etc.
In the following example, simple_list[0][0] and simple_list[1][0] are equal (4 and 4), so simple_list[0][1] and simple_list[1][1] (3 and 5) are compared:
>>> simple_list = [(4, 3), (4, 5), (1, 2)]
>>> sorted(simple_list)
[(1, 2), (4, 3), (4, 5)]
For your case, try the following:
tuples_list = [(9, 7), (78, 24), (17, 74), (53, 81), (40, 43), (79, 82), (84, 46), (68, 53), (92, 95), (60, 38), (20, 62), (72, 57)]
sorted_list = sorted(tuples_list)
Output:
>>> sorted(tuples_list)
[(9, 7), (17, 74), (20, 62), (40, 43), (53, 81), (60, 38), (68, 53), (72, 57), (78, 24), (79, 82), (84, 46), (92, 95)]

partitionBy assigns partitions, but WHERE in each partition

With hash function:
balanceLoad = lambda x: bisect.bisect_left(boundary_array, -keyfunc(x))
Where boundary_array is [-64, -10, 35]
The folowing tells me which partition to assign each element to
rdd.partitionBy(numPartitions, balanceLoad)
However, is there a way to determine /control WHERE in each partition they are assigned / placed? {1,2,3} vs {3,2,1}.
For example when I do this:
rdd = CleanRDD(sc.parallelize(range(100), 4).map(lambda x: (x *((-1) ** x) , x)))
sortByKey(rdd, keyfunc=lambda key: key, ascending=False).collect()
Elements in each partition are in reverse order:
[(64, 64),
(66, 66),
(68, 68),
(70, 70),
(72, 72),
(74, 74),
(76, 76),
(78, 78),
(80, 80),
(82, 82),
(84, 84),
(86, 86),
(88, 88),
(90, 90),
(92, 92),
(94, 94),
(96, 96),
(98, 98),
(10, 10),
(12, 12),
(14, 14),
(16, 16),
(18, 18),
(20, 20),
(22, 22),
(24, 24),
(26, 26),
(28, 28),
(30, 30),
(32, 32),
(34, 34),
(36, 36),
(38, 38),
(40, 40),
(42, 42),
(44, 44),
(46, 46),
(48, 48),
(50, 50),
(52, 52),
(54, 54),
(56, 56),
(58, 58),
(60, 60),
(62, 62),
(-35, 35),
(-33, 33),
(-31, 31),
(-29, 29),
(-27, 27),
(-25, 25),
(-23, 23),
(-21, 21),
(-19, 19),
(-17, 17),
(-15, 15),
(-13, 13),
(-11, 11),
(-9, 9),
(-7, 7),
(-5, 5),
(-3, 3),
(-1, 1),
(0, 0),
(2, 2),
(4, 4),
(6, 6),
(8, 8),
(-99, 99),
(-97, 97),
(-95, 95),
(-93, 93),
(-91, 91),
(-89, 89),
(-87, 87),
(-85, 85),
(-83, 83),
(-81, 81),
(-79, 79),
(-77, 77),
(-75, 75),
(-73, 73),
(-71, 71),
(-69, 69),
(-67, 67),
(-65, 65),
(-63, 63),
(-61, 61),
(-59, 59),
(-57, 57),
(-55, 55),
(-53, 53),
(-51, 51),
(-49, 49),
(-47, 47),
(-45, 45),
(-43, 43),
(-41, 41),
(-39, 39),
(-37, 37)]
Notice that elements in each of the three groups are in reverse order.
How can I correct this?
Determine no, because an order of the shuffle is nondeterministic.
You can control the order but not as a part of the partitioning process or at least not in PySpark. Instead you can take a similar approach like sortByKey and enforce the order per partition afterwards:
def applyOrdering(iter):
"""Takes an itertools.chain object
and returns iterable with specific ordering"""
...
rdd.partitionBy(numPartitions, balanceLoad).mapPartitions(applyOrdering)
Note that iter may be to large fit into memory so you should either increase granularity or use sorting mechanism which doesn't require reading all data at once.

Append values of two strings into pairs

I start with two numpy arrays, the "x values" and the "y values":
import numpy as np
x = np.arange(100)
y = np.arange(100)
The output is
[ 0 1 2 3 4 ..... 96 97 98 99]
[ 0 1 2 3 4 ..... 96 97 98 99]
I would like to append these values together into an array of len() = 100 such that the output is
[ (0,0) (1,1) (2,2) (3,3) .... (98,98) (99,99) ]
How does one use indexing to both (A) put the pairs in the correct order and (B) put the paratheses ( and comma , in the correct order?
For your particular requirement, you can use the built-in zip function, which combines multiple lists at their corresponding indexes (that is ith index of all lists that are parameter to it in combined in the returned iterator).
Example -
import numpy as np
x = np.arange(100)
y = np.arange(100)
print(list(zip(x,y)))
>>> [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8), (9, 9), (10, 10), (11, 11), (12, 12), (13, 13), (14, 14), (15, 15), (16, 16), (17, 17), (18, 18), (19, 19), (20, 20), (21, 21), (22, 22), (23, 23), (24, 24), (25, 25), (26, 26), (27, 27), (28, 28), (29, 29), (30, 30), (31, 31), (32, 32), (33, 33), (34, 34), (35, 35), (36, 36), (37, 37), (38, 38), (39, 39), (40, 40), (41, 41), (42, 42), (43, 43), (44, 44), (45, 45), (46, 46), (47, 47), (48, 48), (49, 49), (50, 50), (51, 51), (52, 52), (53, 53), (54, 54), (55, 55), (56, 56), (57, 57), (58, 58), (59, 59), (60, 60), (61, 61), (62, 62), (63, 63), (64, 64), (65, 65), (66, 66), (67, 67), (68, 68), (69, 69), (70, 70), (71, 71), (72, 72), (73, 73), (74, 74), (75, 75), (76, 76), (77, 77), (78, 78), (79, 79), (80, 80), (81, 81), (82, 82), (83, 83), (84, 84), (85, 85), (86, 86), (87, 87), (88, 88), (89, 89), (90, 90), (91, 91), (92, 92), (93, 93), (94, 94), (95, 95), (96, 96), (97, 97), (98, 98), (99, 99)]
For Python 2.x , please note you do not need list(zip(...)) , since zip itself would return a list , but for Python 3.x , zip returns an iterator, and to print it we would need to convert it into a list.
You can use np.dstack to get the columns :
>>> np.dstack((x,y))
array([[[ 0, 0],
[ 1, 1],
[ 2, 2],
[ 3, 3],
[ 4, 4],
[ 5, 5],
[ 6, 6],
[ 7, 7],
[ 8, 8],
[ 9, 9],
...
[99, 99]]])
And if you want to get tuple instead of list you can use map to convert it to tuple:
>>> map(tuple,np.dstack((x,y))[0])
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8), (9, 9), (10, 10), (11, 11), (12, 12), (13, 13), (14, 14), (15, 15), (16, 16), (17, 17), (18, 18), (19, 19), (20, 20), (21, 21), (22, 22), (23, 23), (24, 24), (25, 25), (26, 26), (27, 27), (28, 28), (29, 29), (30, 30), (31, 31), (32, 32), (33, 33), (34, 34), (35, 35), (36, 36), (37, 37), (38, 38), (39, 39), (40, 40), (41, 41), (42, 42), (43, 43), (44, 44), (45, 45), (46, 46), (47, 47), (48, 48), (49, 49), (50, 50), (51, 51), (52, 52), (53, 53), (54, 54), (55, 55), (56, 56), (57, 57), (58, 58), (59, 59), (60, 60), (61, 61), (62, 62), (63, 63), (64, 64), (65, 65), (66, 66), (67, 67), (68, 68), (69, 69), (70, 70), (71, 71), (72, 72), (73, 73), (74, 74), (75, 75), (76, 76), (77, 77), (78, 78), (79, 79), (80, 80), (81, 81), (82, 82), (83, 83), (84, 84), (85, 85), (86, 86), (87, 87), (88, 88), (89, 89), (90, 90), (91, 91), (92, 92), (93, 93), (94, 94), (95, 95), (96, 96), (97, 97), (98, 98), (99, 99)]
>>>
You could use vstack
In [36]: xy = np.vstack((x,y)).T
In [37]: xy.shape
Out[37]: (100, 2)
In [38]: xy[0]
Out[38]: array([0, 0])
In [39]: xy[1]
Out[39]: array([1, 1])

Categories