Python code with nested for loops is too slow - python

I am doing a problem from HackerRank. This problem defines a zero array of size n at the beginning and then does operations on it. So let's say that array is x = [0, 0, 0, 0, 0, 0]. So n = 6 here. Now consider the operation (they call it query in the problem) [1, 2, 5]. This means that in the array x, add 5 from index 0 to 1. So x now becomes x = [5, 5, 0, 0, 0, 0]. And there could be many such operations(queries). At the end, we just need to find the max element of the final array x. So sample input is
5 3
1 2 100
2 5 100
3 4 100
So we need to have array x of size 5 (initialized to zeros) and there are 3 queries to run on it. If we go through the queries, we find that the max element in the final array is 200. I have done code using nested for loop here. Outer for loop runs through the queries and inner for loop manipulates the array x.
For small values of array size of x, my code works good. But when n = 1000000 and number of queries, m = 100000, the nested for loops runs forever (It acts like an infinite loop). I want to know how can I make this faster.
Following is the nested for loop
# Construct a zero list of length n
worklist = list([0]*n)
# Loop through the queries
for query in queries:
# Since the problem defines the queries vector
# as one based index, we need to modify the
# indices of query
index0, index1 = query[0]-1, query[1]-1
# Now construct the new list with addition
for i in range(index0, index1+1):
worklist[i] = worklist[i] + query[2]
I think I need to modify my algorithm for doing this. Suggestions welcome.

In the Discussions page of this problem, there is a O(n) solution there,
It's the problem about overlap.
The basic thinking is, you just need to mark "add" point and "remove" point in the array, so the final stage you only need to go though the array once and keep "current sum" in current index and you can record the max one for answer.
For example
5 3
1 2 100
2 5 100
3 4 100
your array will simplify to be
0, 0, 0, 0, 0, 0
when take first input record (1 2 100):
100, 0, -100, 0, 0, 0
this means when you doing final scan sum summary, your loop will calculate in step
index 0, sum 100
index 1, sum 100
index 2, sum 0
index 3, sum 0
...
when take second input record (2 5 100):
100, 100, -100, 0, 0, -100
this means when you doing final scan sum summary, your loop will calculate in step
index 0, sum 100
index 1, sum 200
index 2, sum 100
index 3, sum 100
index 4, sum 100
index 5, sum 0
so the max is happend at index 1,
when take second input record (3 4 100):
100, 100, 0, 0, -100, -100
this means when you doing final scan sum summary, your loop will calculate in step
index 0, sum 100
index 1, sum 200
index 2, sum 200
index 3, sum 200
index 4, sum 100
index 5, sum 0
so the max is happend at index 1,

My answer addresses only the algorithmic part of your question, I'm going to simplify i/o and not to implement it as a function, to leave something on which to test your skills.
The idea is, don't store the result but the cumulative delta for each position and, afterwards, find the maximum with a cumulative summation.
Let' s see the first example reported in the statement of the problem,
10 3
1 5 3
4 8 7
6 9 1
We start with l, a list of zeros with length equal to n+1 (why n+1? because we need a little extra space to store a delta when b==n); we want to store in l just the delta's
n, m = 10, 3
l = [0]*(n+1)
We repeat the same ops for the 3 queries and report the state of our list l in a comment
a, b, k = 1, 5, 3
l[a-1] += k ; l[b] -= k
# [0, 0, 3, 0, 0, -3, 0, 0, 0, 0, 0]
a, b, k = 4, 8, 7
l[a-1] += k ; l[b] -= k
# [0, 0, 3, 7, 0, -3, 0, 0, -7, 0, 0]
a, b, k = 6, 9, 1
l[a-1] += k ; l[b] -= k
# [0, 0, 3, 7, 1, -3, 0, 0, -7, -1, 0]
current_max = 0
current_sum = 0
debug = 1
for num in l[:-1]:
current_sum += num
if debug: print(current_sum)
current_max = max(current_max, current_sum)
print(current_max)
Executing the above code gives me
3
3
3
10
10
8
8
8
1
0
10
The first ten numbers are the elements of the summed list, to be compared with the problem statement, and the last number is the required maximum value

Related

Error when trying to implement MERGE algorithm merging to sorted lists of integers in python?

I'm new to both algorithms AND programming.
As an intro to the MERGE algorithms the chapter introduces first the MERGE algorithm by itself. It merges and sorts an array consisting of 2 sorted sub-arrays.
I did the pseudocode on paper according to the book:
Source: "Introduction to Algorithms
Third Edition" Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Clifford Stein
Since I am implementing it in python3 I had to change some lines given that indexing in python starts at 0 unlike in the pseudocode example of the book.
Keep in mind that the input is one array that contains 2 SORTED sub-arrays which are then merged and sorted, and returned. I kept the prints in my code, so you can see my checks...
#!/anaconda3/bin/python3
import math
import argparse
# For now only MERGE slides ch 2 -- Im defining p q and r WITHIN the function
# But for MERGE_SORT p,q and r are defined as parameters!
def merge(ar):
'''
Takes as input an array. This array consists of 2 subarrays that ARE ALLREADY sorted
(small to large). When splitting the array into half, the left
part will be longer by one if not divisible by 2. These subarrays will be
called left and right. Each of the subarrays must already be sorted. Merge() then
merges these sorted arrays into one big sorted array. The sorted array is returned.
'''
print(ar)
p=0 # for now defining always as 0
if len(ar)%2==0:
q=len(ar)//2-1 # because indexing starts from ZERO in py
else:
q=len(ar)//2 # left sub array will be 1 item longer
r=len(ar)-1 # again -1 because indexing starts from ZERO in py
print('p', p, 'q', q, 'r', r)
# lets see if n1 and n2 check out
n_1 = q-p+1 # lenght of left subarray
n_2 = r-q # lenght of right subarray
print('n1 is: ', n_1)
print('n2 is: ', n_2)
left = [0]*(n_1+1) # initiating zero list of lenght n1
right=[0]*(n_2+1)
print(left, len(left))
print(right, len(right))
# filling left and right
for i in range(n_1):# because last value will always be infinity
left[i] = ar[p+i]
for j in range(n_2):
right[j] = ar[q+j+1]
#print(ar[q+j+1])
#print(right[j])
# inserting infinity at last index for each subarray
left[n_1]=math.inf
right[n_2]=math.inf
print(left)
print(right)
# merging: initiating indexes at 0
i=0
j=0
print('p', p)
print('r', r)
for k in range(p,r):
if left[i] <= right[j]:
ar[k]=left[i]
# increase i
i += 1
else:
ar[k]=right[j]
#increase j
j += 1
print(ar)
#############################################################################################################################
# Adding parser
#############################################################################################################################
parser = argparse.ArgumentParser(description='MERGE algorithm from ch 2')
parser.add_argument('-a', '--array', type=str, metavar='', required=True, help='One List of integers composed of 2 sorted halves. Sorting must start from smallest to largest for each of the halves.')
args = parser.parse_args()
args_list_st=args.array.split(',') # list of strings
args_list_int=[]
for i in args_list_st:
args_list_int.append(int(i))
if __name__ == "__main__":
merge(args_list_int)
The problem:
When I try to sort the array as shown in the book the merged array that is returned contains two 6es and the 7 is lost.
$ ./2.merge.py -a=2,4,5,7,1,2,3,6
[2, 4, 5, 7, 1, 2, 3, 6]
p 0 q 3 r 7
n1 is: 4
n2 is: 4
[0, 0, 0, 0, 0] 5
[0, 0, 0, 0, 0] 5
[2, 4, 5, 7, inf]
[1, 2, 3, 6, inf]
p 0
r 7
[1, 2, 2, 3, 4, 5, 6, 6]
This does how ever not happen with arrays of any number higher than 6.
$ ./2.merge.py -a=2,4,5,7,1,2,3,8
[2, 4, 5, 7, 1, 2, 3, 8]
p 0 q 3 r 7
n1 is: 4
n2 is: 4
[0, 0, 0, 0, 0] 5
[0, 0, 0, 0, 0] 5
[2, 4, 5, 7, inf]
[1, 2, 3, 8, inf]
p 0
r 7
[1, 2, 2, 3, 4, 5, 7, 8]
I showed it to a colleague in my class without success. And I've walked it through manually with numbers on paper snippets but withouth success. I hope someone can find my silly mistake because I'm completely stuck.
Thanks
As r is the index of the last value in arr, you need to add one to it to make a range that also includes that final index:
for k in range(p, r + 1):
# ^^^^^
Note that your code could be greatly reduced if you would use list slicing.
Brother you made a very small mistake in this line
for k in range(p,r):
Here you loop is running from p to r-1 and your last index i.e r, will not get iterated.
So you have to use
for k in range(p,r+1):
And in the second testcase a=[2,4,5,7,1,2,3,8]
You are getting the correct output even with your wrong code because you are overwriting the values in array ar and your current code was able to sort the array till index r-1 and the number present at index r will be the same which was present before the execution of your merge function i.e 8
Try using this testcase: [2, 4, 5, 8, 1, 2, 3, 7]
And your output will be [1, 2, 2, 3, 4, 5, 7, 7]
Hope this helped

compute density map D

You are given two integer numbers n and r, such that 1 <= r < n,
a two-dimensional array W of size n x n.
Each element of this array is either 0 or 1.
Your goal is to compute density map D for array W, using radius of r.
The output density map is also two-dimensional array,
where each value represent number of 1's in matrix W within the specified radius.
Given the following input array W of size 5 and radius 1 (n = 5, r = 1)
1 0 0 0 1
1 1 1 0 0
1 0 0 0 0
0 0 0 1 1
0 1 0 0 0
Output (using Python):
3 4 2 2 1
4 5 2 2 1
3 4 3 3 2
2 2 2 2 2
1 1 2 2 2
Logic: Input first row, first column value is 1. r value is 1. So we should check 1 right element, 1 left element, 1 top element, top left, top right, bottom , bottom left and bottom right and sum all elements.
Should not use any 3rd party library.
I did it using for loop and inner for loop and check for each element. Any better work around ?
Optimization: For each 1 in W, update count for locations, in whose neighborhood it belongs
Although for W of size nxn, the following algorithm would still take O(n^2) steps, however if W is sparse i.e. number of 1s (say k) << nxn then instead of rxrxnxn steps for approach stated in question, following would take nxn + rxrxk steps, which is much lower if k << nxn
Given r assigned and W stored as
[[1, 0, 0, 0, 1],
[1, 1, 1, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 0, 1, 1],
[0, 1, 0, 0, 0]]
then following
output = [[ 0 for i in range(5) ] for j in range(5) ]
for i in range(len(W)):
for j in range(len(W[0])):
if W[i][j] == 1:
for off_i in range(-r,r+1):
for off_j in range(-r,r+1):
if (0 <= i+off_i < len(W)) and (0 <= j+off_j < len(W[0])):
output[i+off_i][j+off_j] += 1
stores required values in output
for r = 1, output is as required
[[3, 4, 2, 2, 1],
[4, 5, 2, 2, 1],
[3, 4, 3, 3, 2],
[2, 2, 2, 2, 2],
[1, 1, 2, 2, 2]]

Compute the length of consecutive true values in a list

Essentially this problem can be split into two parts. I have a set of binary values that indicate whether a given signal is present or not. Given that the each value also corresponds to a unit of time (in this case minutes) I am trying to determine how long the signal exists on average given its occurrence within the overall list of values throughout the period I'm analyzing. For example, if I have the following list:
[0,0,0,1,1,1,0,0,1,0,0,0,1,1,1,1,0]
I can see that the signal occurs 3 separate times for variable lengths of time (i.e. in the first case for 3 minutes). If I want to calculate the average length of time for each occurrence however I need an indication of how many independent instances of the signal exist (i.e. 3). I have tried various index based strategies such as:
arb_ops.index(1)
to find the next occurrence of true values and correspondingly finding the next occurrence of 0 to find the length but am having trouble contextualizing this into a recursive function for the entire array.
You could use itertools.groupby() to group consecutive equal elements. To calculate a group's length convert the iterator to a list and apply len() to it:
>>> from itertools import groupby
>>> lst = [0 ,0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0 ,1, 1, 1, 1, 0]
>>> for k, g in groupby(lst):
... g = list(g)
... print(k, g, len(g))
...
0 [0, 0, 0] 3
1 [1, 1, 1] 3
0 [0, 0] 2
1 [1] 1
0 [0, 0, 0] 3
1 [1, 1, 1, 1] 4
0 [0] 1
Another option may be MaskedArray.count, which counts non-masked elements of an array along a given axis:
import numpy.ma as ma
a = ma.arange(6).reshape((2, 3))
a[1, :] = ma.masked
a
masked_array(data =
[[0 1 2]
[-- -- --]],
mask =
[[False False False]
[ True True True]],
fill_value = 999999)
a.count()
3
You can extend Masked Arrays quite far...
#eugene-yarmash solution with the groupby is decent. However, if you wanted to go with a solution that requires no import, and where you do the grouping yourself --for learning purposes-- you could try this::
>>> l = [0,0,0,1,1,1,0,0,1,0,0,0,1,1,1,1,0]
>>> def size(xs):
... sz = 0
... for x in xs:
... if x == 0 and sz > 0:
... yield sz
... sz = 0
... if x == 1:
... sz += 1
... if sz > 0:
... yield sz
...
>>> list(size(l))
[3, 1, 4]
I think this problem is actually pretty simple--you know you have a new signal if you see a value is 1, and the previous value is 0.
The code I provided is kind of long, but super simple, and done without imports.
signal = [0,0,0,1,1,1,0,0,1,0,0,0,1,1,1,1,0]
def find_number_of_signals(signal):
index = 0
signal_counter = 0
signal_duration = 0
for i in range(len(signal) - 1):
if signal[index] == 1:
signal_duration += 1.0
if signal[index- 1] == 0:
signal_counter += 1.0
index += 1
print signal_counter
print signal_duration
print float(signal_duration / signal_counter)
find_number_of_signals(signal)

Efficient way of re-numbering elements in an array

I am reasonably new to python and am trying to implement a genetic algorithm, but need some assistance with the code for one of the operations.
I have formulated the problem this way:
each individual I is represented by a string of M integers
each element e in I takes a value from 0 to N
every number from 0 - N must appear in I at least once
the value of e is not important, so long as each uniquely valued element takes the same unique value (think of them as class labels)
e is less than or equal to N
N can be different for each I
after applying the crossover operation i can potentially generate children which violate one or more of these constraints, so i need to find a way to re-number the elements so that they retain their properties, but fit with the constraints.
for example:
parent_1 (N=5): [1 3 5 4 2 1|0 0 5 2]
parent_2 (N=3): [2 0 1 3 0 1|0 2 1 3]
*** crossover applied at "|" ***
child_1: [1 3 5 4 2 1 0 2 1 3]
child_2: [2 0 1 3 0 1 0 0 5 2]
child_1 obviously still satisfies all of the constraints, as N = 5 and all values 0-5 appear at least once in the array.
The problem lies with child 2 - if we use the max(child_2) way of calculating N we get a value of 5, but if we count the number of unique values then N = 4, which is what the value for N should be. What I am asking (in a very long winded way, granted) is what is a good, pythonic way of doing this:
child_2: [2 0 1 3 0 1 0 0 5 2]
*** some python magic ***
child_2': [2 0 1 3 0 1 0 0 4 2]
*or*
child_2'': [0 1 2 3 1 2 1 1 4 0]
child_2'' is there to illustrate that the values themselves dont matter, so long as each element of a unique value maps to the same value, the constraints are satisfied.
here is what i have tried so far:
value_map = []
for el in child:
if el not in value_map:
value_map.append(el)
for ii in range(0,len(child)):
child[ii] = value_map.index(child[ii])
this approach works and returns a result similar to child_2'', but i can't imagine that it is very efficient in the way it iterates over the string twice, so i was wondering if anyone has any suggestions of how to make it better.
thanks, and sorry for such a long post for such a simple question!
You will need to iterates the list more than once, I don't think there's any way around this. After all, you first have to determine the number of different elements (first pass) before you can start changing elements (second pass). Note, however, that depending on the number of different elements you might have up to O(n^2) due to the repetitive calls to index and not in, which have O(n) on a list.
Alternatively, you could use a dict instead of a list for your value_map. A dictionary has much faster lookup than a list, so this way, the complexity should indeed be on the order of O(n). You can do this using (1) a dictionary comprehension to determine the mapping of old to new values, and (2) a list comprehension for creating the updated child.
value_map = {el: i for i, el in enumerate(set(child))}
child2 = [value_map[el] for el in child]
Or change the child in-place using a for loop.
for i, el in enumerate(child):
child[i] = value_map[el]
You can do it with a single loop like this:
value_map = []
result = []
for el in child:
if el not in value_map:
value_map.append(el)
result.append(value_map.index(el))
One solution I can think of is:
Determine the value of N and determine unused integers. (this forces you to iterate over the array once)
Go through the array and each time you meet a number superior to N, map it to an unused integer.
This forces you to go through the arrays twice, but it should be faster than your example (that forces you to go through the value_map at each element of the array at each iteration)
child = [2, 0, 1, 3, 0, 1, 0, 0, 5, 2]
used = set(child)
N = len(used) - 1
unused = set(xrange(N+1)) - used
value_map = dict()
for i, e in enumerate(child):
if e <= N:
continue
if e not in value_map:
value_map[e] = unused.pop()
child[i] = value_map[e]
print child # [2, 0, 1, 3, 0, 1, 0, 0, 4, 2]
I like #Selçuk Cihan answer. It can also be done in place.
>>> child = [2, 0, 1, 3, 0, 1, 0, 0, 5, 2]
>>>
>>> value_map = []
>>> for i in range(len(child)):
... el = child[i]
... if el not in value_map:
... value_map.append(el)
... child[i] = value_map.index(el)
...
>>> child
[0, 1, 2, 3, 1, 2, 1, 1, 4, 0]
I believe that this works, although I didn't test it for more than the single case that is given in the question.
The only thing that bothers me is that value_map appears three times in the code...
def renumber(individual):
"""
>>> renumber([2, 0, 1, 3, 0, 1, 0, 0, 4, 2])
[0, 1, 2, 3, 1, 2, 1, 1, 4, 0]
"""
value_map = {}
return [value_map.setdefault(e, len(value_map)) for e in individual]
Here is a fast solution, which iterates the list only once.
a = [2, 0, 1, 3, 0, 1, 0, 0, 5, 2]
b = [-1]*len(a)
j = 0
for i in range(len(a)):
if b[a[i]] == -1:
b[a[i]] = j
a[i] = j
j += 1
else:
a[i] = b[a[i]]
print(a) # [0, 1, 2, 3, 1, 2, 1, 1, 4, 0]

Bubble sort in Python, what are these arguments for?

def bbsort(list1):
for passnum in range(len(list1)-1,0,-1):
for i in range (passnum):
if list1[i]>list1[i+1]:
temp =list1[i]
list1[i] = list1[i+1]
list1[i+1] = temp
This is code for a bubble sort, but what is the use of -1,0,-1 in the while condition?
You are looking at arguments to the range() function; the numbers set up the range to start at the list length, minus one, then stepping down to 1 (the end point is not included in the range).
So there are 3 arguments, the first is len(list1) - 1, the second is 0 and the third is -1.
Say the list is length 5, then range(5 - 1, 0, -1) will produce a list with 4, 3, 2, and 1:
>>> list(range(4, 0, -1))
[4, 3, 2, 1]
The for loop steps over these values:
>>> for i in range(4, 0, -1):
... print(i)
...
4
3
2
1
The for loop assigns these numbers to passnum, and the next nested loop uses that to create a new range(). The first time, that inner range will go from 0 to len(list1) - 1 (exclusive), the next time from 0 to len(list1) - 2, etc, until the last time, when it'll run from 0 to 1, always excluding the stop index. For a list of length 5, that means the inner loop first assigns 0, 1, 2, 3 to i, then 0, 1, 2, then 0, 1, then 0.
The code is missing an opportunity to use Python sequence assignment to swap to elements:
def bbsort(list1):
for passnum in range(len(list1) - 1, 0, -1):
for i in range (passnum):
if list1[i] > list1[i + 1]:
list1[i], list1[i + 1] = list1[i + 1], list[i]
The first -1 subtracts 1 from the length of the input list because the indexes start from 0, not 1. The 0 is the start number(only neccesary to get to the next parameter). The 2nd -1 is the step so that tells it to move backwards through the range.
Example:
range(5, 2, -1) is [5, 4, 3]
equal to range(start, stop, step)

Categories