Find missing data indices using python

Find missing data indices using python - python

What is the optimum way to return indices where 1-d array has missing data. The missing data is represented by zeros. The data may be genuinely zero but not missing. We only want to return indices where data is zero for more than or equal to 3 places at a time. For example for array [1,2,3,4,0,1,2,3,0,0,0,1,2,3] the function should only return indices for second segment where there are zeros and not the first instance.
This is actually an interview question :) challenge is to do most effeciently in one line

Keep track of the count of zeros in the current run. Then if a run finishes that has at least three zeros calculate the indexes.
def find_dx_of_missing(a):
runsize = 3 # 3 or more, change to 4 if your need "more than 3"
zcount = 0
for i, n in enumerate(a):
if n == 0:
zcount += 1
else:
if zcount >= runsize:
for j in range(i - zcount, i):
yield j
zcount = 0
if zcount >= runsize: # needed if sequence ends with missing
i += 1
for j in range(i - zcount, i):
yield j
Examples:
>>> a = [1,2,3,4,0,1,2,3,0,0,0,1,2,3]
>>> list(find_dx_of_missing(a))
[8, 9, 10]
>>> a = [0,0,0,3,0,5,0,0,0,0,10,0,0,0,0,0]
>>> list(find_dx_of_missing(a))
[0, 1, 2, 6, 7, 8, 9, 11, 12, 13, 14, 15]
Edit: Since you need a one liner here are two candidates assuming a is your list and n is the smallest run of zeros that count as missing data:
[v for vals in (list(vals) for iszeros, vals in itertools.groupby(xrange(len(a)), lambda dx, a=a: a[dx]==0) if iszeros) for v in vals if len(vals) >= n]
Or
sorted({dx for i in xrange(len(a)-n+1) for dx in xrange(i, i+n) if set(a[i:i+n]) == {0}})

Related

Sum by Factors From Codewars.com

Sinopsis: my code runs well with simple lists, but when I attempt, after the 4 basic test its execution time gets timed out.
Since I don't want to look for others solution, I'm asking for help and someone can show me which part of the code its messing with the time execution in order to focus only into modify that part.
Note: I don't want a finally solution, just know which part of the code I have to change please
Exercise:
Given an array of positive or negative integers
I= [i1,..,in]
you have to produce a sorted array P of the form
[ [p, sum of all ij of I for which p is a prime factor (p positive) of ij] ...]
P will be sorted by increasing order of the prime numbers. The final result has to be given as a string in Java, C# or C++ and as an array of arrays in other languages.
Example:
I = [12, 15] # result = [[2, 12], [3, 27], [5, 15]]
[2, 3, 5] is the list of all prime factors of the elements of I, hence the result.
Notes: It can happen that a sum is 0 if some numbers are negative!
Example: I = [15, 30, -45] 5 divides 15, 30 and (-45) so 5 appears in the result, the sum of the numbers for which 5 is a factor is 0 so we have [5, 0] in the result amongst others.
`
def sum_for_list(lst):
if len(lst) == 0:
return []
max = sorted(list(map(lambda x: abs(x), lst)), reverse = True)[0]
#create the list with the primes, already filtered
primes = []
for i in range (2, max + 1):
for j in range (2, i):
if i % j == 0:
break
else:
for x in lst:
if x % i == 0:
primes.append([i])
break
#i add the sums to the primes
for i in primes:
sum = 0
for j in lst:
if j % i[0] == 0:
sum += j
i.append(sum)
return primes
`
Image
I tried to simplyfy the code as much as I could but same result.
I also tried other ways to iterate in the first step:
# Find the maximum value in the list
from functools import reduce
max = reduce(lambda x,y: abs(x) if abs(x)>abs(y) else abs(y), lst)

Find large number in a list, where all previous numbers are also in the list

I am trying to implement a Yellowstone Integer calculation which suggests that "Every number appears exactly once: this is a permutation of the positive numbers". The formula I have implemented to derive the values is as follows:
import math
yellowstone_list = []
item_list = []
i = 0
while i <= 1000:
if i <= 3:
yellowstone_list.append(i)
else:
j = 1
inList = 1
while inList == 1:
minus_1 = math.gcd(j, yellowstone_list[i-1])
minus_2 = math.gcd(j, yellowstone_list[i-2])
if minus_1 == 1 and minus_2 > 1:
if j in yellowstone_list:
inList = 1
else:
inList = 0
j += 1
yellowstone_list.append(j - 1)
item_list.append(i)
i += 1
The issue becomes that as i increases, the time taken for the formula to determine the value of j also increases (naturally as i is increasingly further away from the start point of j).
What I would like to do is determine the largest value of j in the yellowstone_list, where all the values of 1 to j are already in the list.
As an example, in the below list, j would be 9, as all the values 0 - 9 are in the list:
yellowstone_list = [0, 1, 2, 3, 4, 9, 8, 15, 14, 5, 6, 25, 12, 35, 16, 7]
Any suggestions on how to implement this in an efficient manner?

For the "standalone" problem as stated the algorithm would be:
Sort the list.
Run a counter from 0 while in parallel traversing the list. Once the counter value is unequal to the list element, then you have found one-past the wanted element.
Something like the following:
x=[0, 1, 2, 3, 4, 9, 8, 15, 14, 5, 6, 25, 12, 35, 16, 7]
y=sorted(x)
for i in range(1, len(y)):
if y[i]!=i:
print(i-1)
break
But in your case it appears that the initial list is being built gradually. So each time a number is added to the list, it can be inserted in a sorted manner and can be checked against the previous element and the traversal can start from there for more efficient process.

This is how I would do it:
lst.sort()
for c, i in enumerate(lst):
if c + 1 < len(lst) and lst[c + 1] != i + 1:
j = i
break
else:
j = i
Basically, the list is sorted, and then, it loops through each value, checking if the next value is only 1 greater than the current.

After some time to sit down and think about it, and using the suggestions to sort the list, I came up with two solutions:
Sorting
I implemented #eugebe Sh.'s solution within the while i < 1000 loop as follows:
while i <= 1000:
m = sorted(yellowstone_list)
for n in range(1, len(m)):
if m[n]!=n:
break
if i == 0:
....
In List
I ran an increment to check if the value was in the list using the "in" function, also within the while i < 1000 loop, as follows:
while i <= 1000:
while k in yellowstone_list:
k += 1
if i == 0:
....
Running both codes 100 times, I got the following:
Sorting: Total: 1:56.403527 seconds, Average: 1.164035 seconds.
In List: Total: 1:14.225230 seconds, Average: 0.742252 seconds.

Trying to optimize this code: iterating over a list to replace its values

I am trying to do a challenge in Python, the challenge consists of :
Given an array X of positive integers, its elements are to be transformed by running the following operation on them as many times as required:
if X[i] > X[j] then X[i] = X[i] - X[j]
When no more transformations are possible, return its sum ("smallest possible sum").
Basically you pick two non-equal numbers from the array, and replace the largest of them with their subtraction. You repeat this till all numbers in array are same.
I tried a basic approach by using min and max but there is another constraint which is time. I always get timeout because my code is not optimized and takes too much time to execute. Can you please suggest some solutions to make it run faster.
def solution(array):
while len(set(array)) != 1:
array[array.index(max(array))] = max(array) - min(array)
return sum(array)
Thank you so much !

EDIT
I will avoid to spoil the challenge... because I didn't find the solution in Python. But here's the general design of an algorithm that works in Kotlin (in 538 ms). In Python I'm stuck at the middle of the performance tests.
Some thoughts:
First, the idea to remove the minimum from the other elements is good: the modulo (we remove the minimum as long as it is possible) will be small.
Second, if this minimum is 1, the array will be soon full of 1s and the result is N (the len of the array).
Third, if all elements are equal, the result is N times the value of one element.
The algorithm
The idea is to keep two indices: i is the current index that cycles on 0..N and k is the index of the current minimum.
At the beginning, k = i = 0 and the minimum is m = arr[0]. We advance i until one of the following happen:
i == k => we made a full cycle without updating k, return N*m;
arr[i] == 1 => return N;
arr[i] < m => update k and m;
arr[i] > m => compute the new value of arr[i] (that is arr[i] % m or m if arr[i] is a multiple of m). If thats not m, thats arr[i] % m < m: update k and m;
arr[i] == m => pass.
Bascially, we use a rolling minimum and compute the modulos on the fly until all element are the same. That spares the computation of a min of the array periodically.
PREVIOUS ANSWER
As #BallpointBen wrote, you'll get the n times the GCD of all numbers. But that's cheating ;)! If you want to find a solution by hand, you can optimize your code.
While you don't find N identical numbers, you use the set, max (twice!), min and index functions on array. Those functions are pretty expensive. The number of iterations depend on the array.
Imagine the array is sorted in reverse order: [22, 14, 6, 2]. You can replace 22 by 22-14, 14 by 14-6, ... and get: [8, 12, 4, 2]. Sort again: [12, 8, 4, 2], replace again: [4, 4, 4, 2]. Sort again, replace again (if different): [4, 4, 2, 2], [4, 2, 2, 2], [2, 2, 2, 2]. Actually, in the first pass 14 could be replaced by 14-2*6 = 2 (as in the classic GCD computation), giving the following sequence:
[22, 14, 6, 2]
[8, 2, 2, 2]
[2, 2, 2, 2]
The convergence is fast.
def solution2(arr):
N = len(arr)
end = False
while not end:
arr = sorted(arr, reverse=True)
end = True
for i in range(1, N):
while arr[i-1] > arr[i]:
arr[i-1] -= arr[i]
end = False
return sum(arr)
A benchmark:
import random
import timeit
arr = [4*random.randint(1, 100000) for _ in range(100)] # GCD will be 4 or a multiple of 4
assert solution(list(arr)) == solution2(list(arr))
print(timeit.timeit(lambda: solution(list(arr)), number=100))
print(timeit.timeit(lambda: solution2(list(arr)), number=100))
Output:
2.5396839629975148
0.029025810996245127

def solution(a):
N = len(a)
end = False
while not end:
a = sorted(a, reverse=True)
small = min(a)
end = True
for i in range(1, N):
if a[i-1] > small:
a[i-1] = a[i-1]%small if a[i-1]%small !=0 else small
end = False
return sum(a)
made it faster with a slight change

This solution worked for me. I iterated on the list only once. initially I find the minimum and iterating over the list I replace the element with the rest of the division. If I find a rest equal to 1 the result will be trivially 1 multiplied by the length of the list otherwise if it is less than the minimum, i will replace the variable m with the minimum found and continue. Once the iteration is finished, the result will be the minimum for the length of the list.
Here the code:
def solution(a):
L = len(a)
if L == 1:
return a[0]
m=min(a)
for i in range(L):
if a[i] != m:
if a[i] % m != 0:
a[i] = a[i]%m
if a[i]<m:
m=a[i]
elif a[i] % m == 0:
a[i] -= m * (a[i] // m - 1)
if a[i]==1:
return 1*L
return m*L

If numbers in list are equal to n, print out their indices

The Task:
You are given two parameters, an array and a number. For all the numbers that make n in pairs of two, return the sum of their indices.
input is: arr = [1, 4, 2, 3, 0, 5] and n = 7
output: 11
since the perfect pairs are (4,3) and (2,5) with indices 1 + 3 + 2 + 5 = 11
So far I have this, which prints out the perfect pairs
from itertools import combinations
def pairwise(arr, n):
for i in combinations(arr, 2): # for index in combinations in arr, 2 elements
if i[0] + i[1] == n: # if their sum is equal to n
print(i[0],i[1])
Output:
4,3 2,5
However does anyone has tips on how to print the indices of the perfect pairs? Should I use numpy or should I change the whole function?

Instead of generating combinations of array elements you can generate combinations of indices.
from itertools import combinations
def pairwise(arr, n):
s = 0
for i in combinations(range(len(arr)), 2): # for index in combinations in arr, 2 elements
if arr[i[0]] + arr[i[1]] == n: # if their sum is equal to n
# print(arr[i[0]],arr[i[1]])
# print(i[0],i[1])
s += i[0] + i[1]
# print(s)
return s

You can use a dictonary mapping the indexes:
def pairwise(arr, n):
d = {b:a for a,b in enumerate(arr)} #create indexed dict
for i in combinations(arr, 2): # for index in combinations in arr, 2 elements
if i[0] + i[1] == n: # if their sum is equal to n
print(d[i[0]],d[i[1]])
Here you have a live example

Rather than generating combinations and checking if they add up to n, it's faster to turn your list into a dict where you can look up the exact number you need to add up to n. For each number x you can easily calculate n - x and then look up the index of that number in your dict.
This only works if the input list doesn't contain any duplicate numbers.
arr = [1, 4, 2, 3, 0, 5]
n = 7
indices = {x: i for i, x in enumerate(arr)}
total = 0
for i, x in enumerate(arr):
remainder = n - x
if remainder in indices:
idx = indices[remainder]
total += i + idx
# the loop counts each pair twice (once as [a,b] and once as [b,a]), so
# we have to divide the result by two to get the correct value
total //= 2
print(total) # output: 11
If the input does contain duplicate numbers, you have rewrite the code to store more than one index in the dict:
import collections
arr = [1, 4, 2, 3, 0, 5, 2]
n = 7
indices = collections.defaultdict(list)
for i, x in enumerate(arr):
indices[x].append(i)
total = 0
for i, x in enumerate(arr):
remainder = n - x
for idx in indices[remainder]:
total += i + idx
# the loop counts each pair twice (once as [a,b] and once as [b,a]), so
# we have to divide the result by two to get the correct value
total //= 2

You should use the naive approach here:
process each element of the array with its indice
for each element test for all elements after this one (to avoid duplications) whether their sum is the expected number and if it is add the sum of their indices
Code could be:
def myfunc(arr, number):
tot = 0
for i, val in enumerate(arr):
for j in range(i+1, len(arr)):
if val + arr[j] == number:
tot += i + j
return tot
Control:
>>> myfunc([1, 4, 2, 3, 0, 5], 7)
11
>>> myfunc([2, 4, 6], 8)
2

Best way to replace values in a list based on many indexes

I have list like this:
l = [1,2,3,4,5,6,7,8,9,10]
idx = [2,5,7]
I want to replace values in l with 0, using indexes from idx. For now I do:
for i in idx:
l[i] = 0
This give: l = [1, 2, 0, 4, 5, 0, 7, 0, 9, 10]
Is there better, faster, more pythonic way. This is only small example, but what if I have huge lists?

If you're talking about huge lists, you should really try not to create a new list, as the new list will require space in memory in addition to your input lists.
Now, let's consider the indices that you want to set to 0. These indices are contained in a list (idx), which itself could be just as long as the list with numbers (l). So, if you were to do something like this:
for i in range(len(l)):
if i in idx:
l[i] = 0
it would take O(mn) time, where m is the number of elements in idx and n is the number of elements in l. This is a really slow algorithm.
Now, you really can't do much faster than O(m), seeing as you have to consider every element in idx. But since m is strictly bounded from above by n, it's definitely a better strategy to loop over idx instead:
for i in idx:
l[i] = 0
But let's consider that idx might contain elements that are not valid indices of l (i.e. there is at least one element in idx whose value is greater than the largest index in l). Then, you could do this:
for i in idx:
if i<len(l):
l[i] = 0
or:
for ind in (i for i in idx if i<len(L)):
l[ind] = 0
Now, this makes O(m) comparisons, which could potentially be improved upon. For example, if idx were sorted, then a modified binary search could provide the appropriate slice of idx that has valid indices:
def binSearch(L, idx, i=0, j=None): # note that the list is not sliced, unlike some common binary search implementations. This saves on additional space
if not idx:
return pad
if j==None:
j = len(idx)-1
mid = (i+j)//2
if idx[mid] == len(L)-1:
return mid
elif idx[mid] > len(L)-1:
return binSearch(L, idx, i, mid-1)
else:
return binSearch(L, idx, mid+1, j)
So now, you could replace only the valid indices without any comparisons at all:
for ind in range(binSearch(L, idx)):
l[idx[ind]] = 0
Note that this approach takes O(log m) time to apply binSearch on idx in the first place
This would work if idx were already sorted. However, if that is an invalid assumption, then you might want to sort it yourself, which would cost O(m log m) time, which would be slower than the aforementioned O(m) implementation.
Yet, if idx were sufficiently large, you could try a distributed approach, with multiprocessing:
import multiprocessing as mp
def replace(l, idx):
numWorkers = mp.cpu_count()*2 -1
qIn = mp.Queue(maxsize=len(idx))
qOut = mp.Queue()
procs = [mp.Process(target=slave, args=(L, qIn, qOut)) for _ in range(numWorkers)]
for p in procs:
p.start()
for i in idx:
qIn.put(i)
numFinished = 0
while numFinished != numWorkers:
i = qOut.get()
if i is None:
numFinished += 1
continue
l[i] = 0
def slave(L, qIn, qOut):
for i in iter(qIn.get, None):
if i< len(L):
qOut.put(i)
qOut.put(None)
Of course, you could further improve this by adding the binSearch to the distributed solution as well, but I'll leave that to you.

Don't create another list for index. Instead:
l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
index = 1
while index < len(l):
if index == 2:
l[index] = 0
elif index == 5:
l[index] = 0
elif index == 7:
l[index] = 0
index += 1
print(l)
You do not have to use "elif" statements if you combine them all on one line with an "or" statement. For example:
l = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
index = 1
while index < len(l):
if (index == 2) or (index == 5) or (index == 7):
l[index] = 0
index += 1
print(l)

I think this is perfectly fine. You could write a list comprehension, like this:
[v if i not in idx else 0 for i, v in enumerate(l)]
Or change it in place by iterating over l
for i, v in enumerate(l):
if i in idx:
l[i] = 0
But I find that harder to read, and very likely slower. I don't think any other solution will beat yours by a significant margin, ignoring CPU caching.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find missing data indices using python - python

Related

Sum by Factors From Codewars.com

Find large number in a list, where all previous numbers are also in the list

Trying to optimize this code: iterating over a list to replace its values

If numbers in list are equal to n, print out their indices

Best way to replace values in a list based on many indexes

Categories

Resources