Related
I am working on this coding challenge named Similarity Measure. Now the problem is my code works fine for some test cases, and failed due to the Time Limit Exceed problem. However, my code is not wrong, takes more than 25 sec for input of range 10^4.
I need to know what I can do to make it more efficient, I cannot think on any better solution than my code.
Question goes like this:
Problems states that given an array of positive integers, and now we have to answer based upon the Q queries.
Query: Given two indices L,R, determine the maximum absolute difference of index of two same elements lies between L and R
If in a range, there are no two same inputs then return 0
INPUT FORMAT
The first line contains N, no. of elements in the array A
The Second line contains N space separated integers that are elements of the array A
The third line contains Q the number of queries
Each of the Q lines contains L, R
CONSTRAINTS
1 <= N, Q <= 10^4
1 <= Ai <= 10^4
1 <= L, R <= N
OUTPUT FORMAT
For each query, print the ans in a new line
Sample Input
5
1 1 2 1 2
5
2 3
3 4
2 4
3 5
1 5
Sample Output
0
0
2
2
3
Explanation
[2,3] - No two elements are same
[3,4] - No two elements are same
[2,4] - there are two 1's so ans = |4-2| = 2
[3,5] - there are two 2's so ans = |5-3| = 2
[1,5] - there are three 1's and two 2's so ans = max(|4-2|, |5-3|, |4-1|, |2-1|) = 3
Here is my algorithm:
To take the input and test the range in a different method
Input will be L, R and the Array
For difference between L and R equal to 1, check if the next element is equal, return 1 else return 0
For difference more than 1, loop through array
Make a nested loop to check for the same element, if yes, store the difference into maxVal variable
Return maxVal
My Code:
def ansArray(L, R, arr):
maxVal = 0
if abs(R - L) == 1:
if arr[L-1] == arr[R-1]: return 1
else: return 0
else:
for i in range(L-1, R):
for j in range(i+1, R):
if arr[i] == arr[j]:
if (j-i) > maxVal: maxVal = j-i
return maxVal
if __name__ == '__main__':
input()
arr = (input().split())
for i in range(int(input())):
L, R = input().split()
print(ansArray(int(L), int(R), arr))
Please help me with this. I really want to learn a different and a more efficient way to solve this problem. Need to pass all the TEST CASES. :)
You can try this code:
import collections
def ansArray(L, R, arr):
dct = collections.defaultdict(list)
for index in range(L - 1, R):
dct[arr[index]].append(index)
return max(lst[-1] - lst[0] for lst in dct.values())
if __name__ == '__main__':
input()
arr = (input().split())
for i in range(int(input())):
L, R = input().split()
print(ansArray(int(L), int(R), arr))
Explanation:
dct is a dictionary that for every seen number keeps a list of indices. The list is sorted so lst[-1] - lst[0] will give maximum absolute difference for this number. Applying max to all this differences you get the answer. Code complexity is O(R - L).
This can be solved as O(N) approximately the following way:
from collections import defaultdict
def ansArray(L, R, arr) :
# collect the positions and save them into the dictionary
positions = defaultdict(list)
for i,j in enumerate(arr[L:R+1]) :
positions[j].append(i)
# create the list of the max differences in index
max_diff = list()
for vals in positions.values() :
max_diff.append( max(vals) - min(vals) )
# now return the max element from the list we have just created
if len(max_diff) :
return max(max_diff)
else :
return 0
I am unsure how to phrase a conditional loop. I would like to create a NxN array consisting of elements which are equal to either 1 or -1. I have created a 3x3 array to begin with, which when printed consists of numbers between 0 and 1.
col = 3
row = 3
mymatrix = np.random.rand(col,row)
I would like to set their values to either 1 or -1, depending on whether the value they have been assigned is greater than or equal to 0.5.
for i in range(0,2):
for j in range(0,2):
if mymatrix[i,j] <= 0.5:
mymatrix[i,j] = -1.0
elif mymatrix[i,j] >= 0.5:
mymatrix[i,j] = 1.0
print mymatrix
For some reason, this prints a 3x3 matrix with only the inner 4 elements as 1 or -1.
[[-1, 1,0.19343979]
[ 1, -1, 0.59891168]
[ 0.02766664, 0.73244162, 0.41541223]]
I know that mymatrix[2,2] is 0.41541223, so I cannot understand why this is not being looped through.
Your Problem
You need to loop over all rows and columns:
for i in range(row):
for j in range(col):
You only loop over the first two:
col = 3
row = 3
In Python range(start, end) gives a range starting form start inclusive and end exclusive.
for i in range(0,2):
for j in range(0,2):
BTW, in Python rows are the first dimension so this would be better:
mymatrix = np.random.rand(row, col)
even though both row and col have the same value in your case.
Better Solution
Working with NumPy arrays, you should avoid loops as much as possible,
because loops can be really slow. To my experience those loops over NumPy arrays can be even slower than loops over Python lists.
Therefore, better do:
np.where(mymatrix <= 0.5, -1, 1)
This is a vectorized version that is much faster.
One of the goals using NumPy is often speed.
where(condition, [x, y])
Return elements, either from x or y, depending on condition.
Your error is in upper bound:
mymatrix = np.random.rand(col,row)
for i in range(0,3):
for j in range(0,3):
if mymatrix[i,j] <= 0.5:
mymatrix[i,j] = -1.0
elif mymatrix[i,j] >= 0.5:
mymatrix[i,j] = 1.0
print (mymatrix)
What I'm looking to do:
I need to make a function that, given a list of positive integers (there can be duplicate integers), counts all triples (in the list) in which the third number is a multiple of the second and the second is a multiple of the first:
(The same number cannot be used twice in one triple, but can be used by all other triples)
For example, [3, 6, 18] is one because 18 goes evenly into 6 which goes evenly into 3.
So given [1, 2, 3, 4, 5, 6] it should find:
[1, 2, 4] [1, 2, 6] [1, 3, 6]
and return 3 (the number of triples it found)
What I've tried:
I made a couple of functions that work but are not efficient enough. Is there some math concept I don't know about that would help me find these triples faster? A module with a function that does better? I don't know what to search for...
def foo(q):
l = sorted(q)
ln = range(len(l))
for x in ln:
if len(l[x:]) > 1:
for y in ln[x + 1:]:
if (len(l[y:]) > 0) and (l[y] % l[x] == 0):
for z in ln[y + 1:]:
if l[z] % l[y] == 0:
ans += 1
return ans
This one is a bit faster:
def bar(q):
l = sorted(q)
ans = 0
for x2, x in enumerate(l):
pool = l[x2 + 1:]
if len(pool) > 1:
for y2, y in enumerate(pool):
pool2 = pool[y2 + 1:]
if pool2 and (y % x == 0):
for z in pool2:
if z % y == 0:
ans += 1
return ans
Here's what I've come up with with help from y'all but I must be doing something wrong because it get's the wrong answer (it's really fast though):
def function4(numbers):
ans = 0
num_dict = {}
index = 0
for x in numbers:
index += 1
num_dict[x] = [y for y in numbers[index:] if y % x == 0]
for x in numbers:
for y in num_dict[x]:
for z in num_dict[y]:
print(x, y, z)
ans += 1
return ans
(39889 instead of 40888) - oh, I accidentally made the index var start at 1 instead of 0. It works now.
Final Edit
I've found the best way to find the number of triples by reevaluating what I needed it to do. This method doesn't actually find the triples, it just counts them.
def foo(l):
llen = len(l)
total = 0
cache = {}
for i in range(llen):
cache[i] = 0
for x in range(llen):
for y in range(x + 1, llen):
if l[y] % l[x] == 0:
cache[y] += 1
total += cache[x]
return total
And here's a version of the function that explains the thought process as it goes (not good for huge lists though because of spam prints):
def bar(l):
list_length = len(l)
total_triples = 0
cache = {}
for i in range(list_length):
cache[i] = 0
for x in range(list_length):
print("\n\nfor index[{}]: {}".format(x, l[x]))
for y in range(x + 1, list_length):
print("\n\ttry index[{}]: {}".format(y, l[y]))
if l[y] % l[x] == 0:
print("\n\t\t{} can be evenly diveded by {}".format(l[y], l[x]))
cache[y] += 1
total_triples += cache[x]
print("\t\tcache[{0}] is now {1}".format(y, cache[y]))
print("\t\tcount is now {}".format(total_triples))
print("\t\t(+{} from cache[{}])".format(cache[x], x))
else:
print("\n\t\tfalse")
print("\ntotal number of triples:", total_triples)
Right now your algorithm has O(N^3) running time, meaning that every time you double the length of the initial list the running time goes up by 8 times.
In the worst case, you cannot improve this. For example, if your numbers are all successive powers of 2, meaning that every number divides every number grater than it, then every triple of numbers is a valid solution so just to print out all the solutions is going to be just as slow as what you are doing now.
If you have a lower "density" of numbers that divide other numbers, one thing you can do to speed things up is to search for pairs of numbers instead of triples. This will take time that is only O(N^2), meaning the running time goes up by 4 times when you double the length of the input list. Once you have a list of pairs of numbers you can use it to build a list of triples.
# For simplicity, I assume that a number can't occur more than once in the list.
# You will need to tweak this algorithm to be able to deal with duplicates.
# this dictionary will map each number `n` to the list of other numbers
# that appear on the list that are multiples of `n`.
multiples = {}
for n in numbers:
multiples[n] = []
# Going through each combination takes time O(N^2)
for x in numbers:
for y in numbers:
if x != y and y % x == 0:
multiples[x].append(y)
# The speed on this last step will depend on how many numbers
# are multiples of other numbers. In the worst case this will
# be just as slow as your current algoritm. In the fastest case
# (when no numbers divide other numbers) then it will be just a
# O(N) scan for the outermost loop.
for x in numbers:
for y in multiples[x]:
for z in multiples[y]:
print(x,y,z)
There might be even faster algorithms, that also take advantage of algebraic properties of division but in your case I think a O(N^2) is probably going to be fast enough.
the key insight is:
if a divides b, it means a "fits into b".
if a doesn't divide c, then it means "a doesn't fit into c".
And if a can't fit into c, then b cannot fit into c (imagine if b fitted into c, since a fits into b, then a would fit into all the b's that fit into c and so a would have to fit into c too.. (think of prime factorisation etc))
this means that we can optimise. If we sort the numbers smallest to largest and start with the smaller numbers first. First iteration, start with the smallest number as a
If we partition the numbers into two groups, group 1, the numbers which a divides, and group 2 the group which a doesn't divide, then we know that no numbers in group 1 can divide numbers in group 2 because no numbers in group 2 have a as a factor.
so if we had [2,3,4,5,6,7], we would start with 2 and get:
[2,4,6] and [3,5,7]
we can repeat the process on each group, splitting into smaller groups. This suggests an algorithm that could count the triples more efficiently. The groups will get really small really quickly, which means its efficiency should be fairly close to the size of the output.
This is the best answer that I was able to come up with so far. It's fast, but not quite fast enough. I'm still posting it because I'm probably going to abandon this question and don't want to leave out any progress I've made.
def answer(l):
num_dict = {}
ans_set = set()
for a2, a in enumerate(l):
num_dict[(a, a2)] = []
for x2, x in enumerate(l):
for y2, y in enumerate(l):
if (y, y2) != (x, x2) and y % x == 0:
pair = (y, y2)
num_dict[(x, x2)].append(pair)
for x in num_dict:
for y in num_dict[x]:
for z in num_dict[y]:
ans_set.add((x[0], y[0], z[0]))
return len(ans_set)
I have a text files that lists pairs, for example
10,1
2,7
3,1
10,1
That has then been turned into a symmetric matrix, so the (1,10) entry is the number of times the pair (1,10) showed up on the list. I would now like to subsample this matrix. By subsample I mean - I would like to make a matrix that would have been the result of only using a random 30% of the lines in the original text file. So in this example, had I erased 70% of the text file, the (1,10) pair might only show up once instead of twice, and so the (1,10) entry in the matrix would be 1 instead of 2.
This can be done easily if I actually have the original text file, by just using random.sample to pick out 30% of the lines in the files. But if I only have the matrix, how can I randomly decimate 70% of the data?
I guess the best way depends on where your data is large:
Do you have a huge matrix, with mostly small counts in it? or
Do you have a moderately sized matrix with huge numbers of counts in it?
Here's a solution that will be suited to the second case, though it will also work
OK in the first case.
Basically, the fact that the counts happen to be in a 2D matrix is not so
important: this is basically the problem of sampling from a population that has
been binned. So what we can do is extract the bins directly, and forget about the
matrix for a bit:
import numpy as np
import random
# Input counts matrix
mat = np.array([
[5, 5, 2],
[1, 1, 3],
[6, 0, 4]
], dtype=np.int64)
# Build a list of (row,col) pairs, and a list of counts
keys, counts = zip(*[
((i,j), mat[i,j])
for i in range(mat.shape[0])
for j in range(mat.shape[1])
if mat[i,j] > 0
])
And then sample from those bins, using a cumulative array of counts:
# Make the cumulative counts array
counts = np.array(counts, dtype=np.int64)
sum_counts = np.cumsum(counts)
# Decide how many counts to include in the sample
frac_select = 0.30
count_select = int(sum_counts[-1] * frac_select)
# Choose unique counts
ind_select = sorted(random.sample(xrange(sum_counts[-1]), count_select))
# A vector to hold the new counts
out_counts = np.zeros(counts.shape, dtype=np.int64)
# Perform basically the merge step of merge-sort, finding where
# the counts land in the cumulative array
i = 0
j = 0
while i<len(sum_counts) and j<len(ind_select):
if ind_select[j] < sum_counts[i]:
j += 1
out_counts[i] += 1
else:
i += 1
# Rebuild the matrix using the `keys` list from before
out_mat = np.zeros(mat.shape, dtype=np.int64)
for i in range(len(out_counts)):
out_mat[keys[i]] = out_counts[i]
Now you will have the sampled matrix in out_mat.
Unfortunately example two and three do not observe correct distribution according to the number of appearances of lines in the original file.
Instead of removing tuples from the original data you could randomly remove counts from your matrix.
So you have to generate random indices and decrease the corresponding count. Be sure to avoid decreasing a zero count and instead generate a new index. Do this until you have decreased the overall amount of counted tuples to 30%.
Basically this could look like this:
amount_to_decrease = 0.7 * overall_amount
decreased = 0
while decreased < amount_to_decrease:
x = random.randint(0, n)
y = random.randint(0, n)
if matrix[x][y] > 0:
matrix[x][y]-=1
decreased+=1
if x != y:
matrix[y][x]-=1
This should work well if your matrix is well populated.
If it's not you might want to recreate a list of tuples from the matrix and then choose a random subset from that. After this recreate your matrix from the remaining tuples:
tuples = []
for y in range(n):
for x in range(y+1):
for _ in range(matrix[x][y])
tuples.append((x,y))
remaining = random.sample(tuples, int(overall_amount*0.7) )
Or you can do a combination where you do a first pass to find all indices that are not zero and then sample these to decrease the counts:
valid_indices = []
for y in range(n):
for x in range(y+1):
valid_indices.append((x,y))
amount_to_decrease = 0.7 * overall_amount
decreased = 0
while decreased < amount_to_decrease:
x,y = random.choice(valid_indices)
matrix[x][y]-=1
if x != y:
matrix[y][x]-=1
if matrix[y][x] == 0:
valid_indices.remove((x,y))
There is another approach that would use the right possibilities but might not give you an exact reduction. The idea is to set a probability for keeping a line/count. This could be 0.3 if you are aiming for a reduction to 30%. Then you can go over the matrix and check for every count if it should be kept or not.
keep_chance = 0.3
for y in range(n):
for x in range(y+1):
for _ in range(matrix[x][y])
if random.random() > keep_chance:
matrix[x][y] -= 1
if x != y:
matrix[y][x]-=1
Assuming that the couples 1,10 and 10,1 are different, so that mat[1][10] is not necessarily the same as mat[10][1] (if not, read below the line)
First compute the sum of all the values in the matrix.
Let this sum be S. This counts the number of rows in the file.
Let x and y the dimensions of the matrix.
Now loop for n from 0 to [70% of S]:
pick a random integer between 1 and x. let this be j
pick a random integer between 1 and y. let this be k
if mat[j][k] > 0, decrease mat[j][k] and do n++
Since you increase a single value in the matrix for each row in your file, decreasing randomly a positive value in the matrix is the same as decimating the rows in the file.
If 10,1 is the same of 1,10 you don't need half of the matrix, so you can change the algorithm like this:
Loop for n from 0 to [70% of S]:
pick a random integer between 1 and x. Let this be j
pick a random integer between 1 and k. Let this be k
if mat[j][k] > 0, decrease mat[j][k] and do n++
is there a good algorithm for checking whether there are 5 same elements in a row or a column or diagonally given a square matrix, say 6x6?
there is ofcourse the naive algorithm of iterating through every spot and then for each point in the matrix, iterate through that row, col and then the diagonal. I am wondering if there is a better way of doing it.
You could keep a histogram in a dictionary (mapping element type -> int). And then you iterate over your row or column or diagonal, and increment histogram[element], and either check at the end to see if you have any 5s in the histogram, or if you can allow more than 5 copies, you can just stop once you've reached 5 for any element.
Simple, one-dimensional, example:
m = ['A', 'A', 'A', 'A', 'B', 'A']
h = {}
for x in m:
if x in h:
h[x] += 1
else:
h[x] = 1
print "Histogram:", h
for k in h:
if h[k]>=5:
print "%s appears %d times." % (k,h[k])
Output:
Histogram: {'A': 5, 'B': 1}
A appears 5 times.
Essentially, h[x] will store the number of times the element x appears in the array (in your case, this will be the current row, or column or diagonal). The elements don't have to appear consecutively, but the counts would be reset each time you start considering a new row/column/diagonal.
You can check whether there are k same elements in a matrix of integers in a single pass.
Suppose that n is the size of the matrix and m is the largest element. We have n column, n row and 1 diagonal.
Foreach column, row or diagonal we have at most n distinct element.
Now we can create a histogram containing (n + n + 1) * (2 * m + 1) element. Representing
the rows, columns and the diagonal each of them containing at most n distinct element.
size = (n + n + 1) * (2 * m + 1)
histogram = zeros(size, Int)
Now the tricky part is how to update this histogram ?
Consider this function in pseudo-code:
updateHistogram(i, j, element)
if (element < 0)
element = m - element;
rowIndex = i * m + element
columnIndex = n * m + j * m + element
diagonalIndex = 2 * n * m + element
histogram[rowIndex] = histogram[rowIndex] + 1
histogram[columnIndex] = histogram[columnIndex] + 1
if (i = j)
histogram[diagonalIndex] = histogram[diagonalIndex] + 1
Now all you have to do is to iterate throw the histogram and check whether there is an element > k
Your best approach may depend on whether you control the placement of elements.
For example, if you were building a game and just placed the most recent element on the grid, you could capture into four strings the vertical, horizontal, and diagonal strips that intersected that point, and use the same algorithm on each strip, tallying each element and evaluating the totals. The algorithm may be slightly different depending on whether you're counting five contiguous elements out of the six, or allow gaps as long as the total is five.
For rows you can keep a counter, which indicates how many of the same elements in a row you currently have. To do this, iterate through the row and
if current element matches the previous element, increase the counter by one. If counter is 5, then you have found the 5 elements you wanted.
if current element doesn't match previous element, set the counter to 1.
The same principle can be applied to columns and diagonals as well. You probably want to use array of counters for columns (one element for each column) and diagonals so you can iterate through the matrix once.
I did the small example for a smaller case, but you can easily change it:
n = 3
matrix = [[1, 2, 3, 4],
[1, 2, 3, 1],
[2, 3, 1, 3],
[2, 1, 4, 2]]
col_counter = [1, 1, 1, 1]
for row in range(0, len(matrix)):
row_counter = 1
for col in range(0, len(matrix[row])):
current_element = matrix[row][col]
# check elements in a same row
if col > 0:
previous_element = matrix[row][col - 1]
if current_element == previous_element:
row_counter = row_counter + 1
if row_counter == n:
print n, 'in a row at:', row, col - n + 1
else:
row_counter = 1
# check elements in a same column
if row > 0:
previous_element = matrix[row - 1][col]
if current_element == previous_element:
col_counter[col] = col_counter[col] + 1;
if col_counter[col] == n:
print n, 'in a column at:', row - n + 1, col
else:
col_counter[col] = 1
I left out diagonals to keep the example short and simple, but for diagonals you can use the same principle as you use on columns. The previous element would be one of the following (depending on the direction of diagonal):
matrix[row - 1][col - 1]
matrix[row - 1][col + 1]
Note that you will need to make a little bit extra effort in the second case. For example traverse the row in the inner loop from right to left.
I don't think you can avoid iteration, but you can at least do an XOR of all elements and if the result of that is 0 => they are all equal, then you don't need to do any comparisons.
You can try improve your method with some heuristics: use the knowledge of the matrix size to exclude element sequences that do not fit and suspend unnecessary calculation. In case the given vector size is 6, you want to find 5 equal elements, and the first 3 elements are different, further calculation do not have any sense.
This approach can give you a significant advantage, if 5 equal elements in a row happen rarely enough.
If you code the rows/columns/diagonals as bitmaps, "five in a row" means "mask % 31== 0 && mask / 31 == power_of_two"
00011111 := 0x1f 31 (five in a row)
00111110 := 0x3e 62 (five in a row)
00111111 := 0x3f 63 (six in a row)
If you want to treat the six-in-a-row case also as as five-in-a-row, the easiest way is probably to:
for ( ; !(mask & 1) ; mask >>= 1 ) {;}
return (mask & 0x1f == 0x1f) ? 1 : 0;
Maybe the Stanford bit-tweaking department has a better solution or suggestion that does not need looping?